Literature DB >> 24723936

Oncofinder, a new method for the analysis of intracellular signaling pathway activation using transcriptomic data.

Anton A Buzdin¹, Alex A Zhavoronkov², Mikhail B Korzinkin³, Larisa S Venkova⁴, Alexander A Zenin⁴, Philip Yu Smirnov⁴, Nikolay M Borisov⁵.

Abstract

We propose a new biomathematical method, OncoFinder, for both quantitative and qualitative analysis of the intracellular signaling pathway activation (SPA). This method is universal and may be used for the analysis of any physiological, stress, malignancy and other perturbed conditions at the molecular level. In contrast to the other existing techniques for aggregation and generalization of the gene expression data for individual samples, we suggest to distinguish the positive/activator and negative/repressor role of every gene product in each pathway. We show that the relative importance of each gene product in a pathway can be assessed using kinetic models for "low-level" protein interactions. Although the importance factors for the pathway members cannot be so far established for most of the signaling pathways due to the lack of the required experimental data, we showed that ignoring these factors can be sometimes acceptable and that the simplified formula for SPA evaluation may be applied for many cases. We hope that due to its universal applicability, the method OncoFinder will be widely used by the researcher community.

Entities: Chemical Disease Gene Species

Keywords: expression level profiling; microchip transcriptome investigation; mitogenic signaling pathways; signalome profiling; stochastic robustness analysis; targeted anti-cancer drugs

Year: 2014 PMID： 24723936 PMCID： PMC3971199 DOI： 10.3389/fgene.2014.00055

Source DB: PubMed Journal: Front Genet ISSN： 1664-8021 Impact factor: 4.599

Intracellular signaling pathways (SPs) regulate numerous processes involved in normal and pathological conditions including development, growth, aging, and cancer. Many bioinformatic tools have been developed recently that analyze SPs. However, none of them makes it possible to efficiently do the high-throughput quantification of pathway activation scores for the individual biological samples. Here we propose a method for quick, informative and large-scale screening of changes in signaling pathway activation (SPA) in cells and tissues. These changes may reflect various differential conditions like differences in physiological state, aging, disease, treatment with drugs, infections, media composition, additives, etc. One of the potential applications of SPA studies may be in utilizing mathematical algorithms to identify and rank the medicines based on their predicted efficacy. The information about SPA can be obtained from the massive proteomic or transcriptomic data. Although the proteomic level may be somewhat closer to the biological function of SPA, the transcriptomic level of studies today is far more feasible in terms of performing experimental tests and analyzing the data. The transcriptomic methods like Next-generation sequencing (NGS) or microarray analysis of RNA can routinely determine expression levels for all or virtually all human genes (Shirane et al., 2004). Transcriptome profiling may be performed for the minute amount of the tissue sample, not necessarily fresh, but also for the clinical formalin-fixed, paraffin-embedded (FFPE) tissue blocks. For the molecular analysis of cancer, gene expression can be interpreted in terms of abnormal SPA features of various pro- and antimitotic SPs. Such analysis may improve further decision-making process of treatment strategy selection by the clinician. Pro- and antimitotic SPs that determine various stages of cell cycle progression remained in the spotlight of the computational biologists for more than a decade (Kholodenko et al., 1999; Borisov et al., 2009; Kuzmina and Borisov, 2011). Today, hundreds of SPs and related gene product interaction maps that show sophisticated relationships between the individual molecules, are cataloged in various databases like UniProt (The UniProt consortium, 2011), HPRD (Mathivanan et al., 2006), QIAGEN SABiosciences (SABiosciences), WikiPathways (Bauer-Mehren et al., 2009), Ariadne Pathway Studio (Nikitin et al., 2003), SPIKE (Elkon et al., 2008), Reactome (Haw and Stein, 2012), KEGG (Nakaya et al., 2013), etc. One group of bioinformatic approaches integrated the analysis of transcriptome-wide data with the models employing the mass action law and Michaelis-Meten kinetics (Yizhak et al., 2013). These methods which were developing during last 15 years, however, remained purely fundamental until recently, primarily, because of the multiplicity of interaction domains in the signal transducer proteins that enormously increase the interactome complexity (Conzelmann et al., 2006; Borisov et al., 2008). Secondly, a considerable number of unknown free parameters, such as kinetics constants and/or concentrations of protein molecules, significantly complicated the SPA analysis. Yizhak et al. (2013) suggested that the clinical efficiency of several drugs, e.g., geroprotectors, may be evaluated as the ability to induce the kinetic models of the pathways into the steady state. However, protein-protein interactions were quantitatively characterized in detail only for a tiny fraction of SPs. This approach is also time-consuming since to process each transcriptomic dataset it requires extensive calculations for the kinetic models (Yizhak et al., 2013). However, all the contemporary bioinformatical methods that were proposed for digesting large-scale gene expression data followed by recognition and analysis of SPs, have an important disadvantage. They do not allow tracing the overall pathway activation signatures and quantitively estimate the extent of SPA (Kuzmina and Borisov, 2011; Hwang, 2012; Yizhak et al., 2013). This may be due to lack of the definition of the specific roles of the individual gene products in the overall signal transduction process, incorporated in the calculation matrix used to estimate SPA. Here we propose a new method that, to our knowledge, for the first time makes it possible to quantitatively estimate SPA for individual samples basing on the large-scale gene expression data. The method was previously announced by our team here (Zhavoronkov et al., 2014). Theoretically, the signal transduction efficiency at every stage of the SP depends on the concentrations of the interacting gene products. The computational modeling of the signal transduction processes indicated that most of the interacting proteins can be found in the living cells at the concentrations significantly lower than the saturation levels for each transduction step (Birtwistle et al., 2007; Borisov et al., 2009). Our model is based on the correlation of the signal transducer concentrations and the overall SPA. We also determined the overall individual roles of certain gene products in the functioning of each individual SP. These roles can be either positive or negative signal transduction regulators; alternatively, for some proteins the roles may be undefined or neutral. Finally, these roles may be characterized quantitatively depending on the individual importances of the individual interactors in the overall SPA. The determination of these roles for each individual SP is a non-trivial task that has several uncertainties. Namely, protein interactions within each pathway may be competitive or independent, and therefore, belong to a sequential or parallel series of the nearby events (Borisov et al., 2006; Conzelmann et al., 2006). The overall graph for the protein interaction events may include both sequential (pathway-like) and parallel (network-like) edges (Conzelmann et al., 2006; Borisov et al., 2008). The role of each gene product in the signal transduction may depend on whether it works in a sequential or a parallel way. Alternatively, as the raw approximation of this situation, one may propose a simplified method that utilizes only the overall roles of each gene product in the SPA. In this case, each simplified signaling graph includes only two types of branches of protein interaction chain: one for sequential events that promote SPA, and another for repressor sequential events. Under these conditions, it can be presumed that all activator/repressor members have equal importances for the SPA, and come to the following formula for the overall signal outcome (SO) of a given pathway, . Here the multiplication is done over all possible activator and repressor proteins in the pathway, [AGEL] and [RGEL] are relative gene expression levels of activator (i) and repressor (j) members, respectively. To obtain an additive value, it is possible to take the logarithmic levels of gene expression, and thus come to a function of pathway activation strength, PAS, which operates with the experimental datasets obtained during comprehensive profiling of gene expression, for a pathway p, PAS = ∑ · lg(CNR). Here the case-to-norm ratio, CNR, is the ratio of the expression levels of a gene n in the sample (e.g., of a cancer patient) and in the control (e.g., average value for healthy group). The discrete value ARR (activator/repressor role) shows whether the gene product promotes SPA (1), inhibits it (−1) or plays an intermediate role (0.5, 0 or −0.5, respectively). Negative and positive overall PAS values correspond, respectively, to decreased or increased activity of SP in a sample, with the extent of this activity proportional to the absolute value of PAS. However, the assumption of sequential protein-protein interaction in pathways may seem rather artificial. Although it is difficult to precisely estimate the importance of certain gene products that act in the pathway in a non-sequential mode, the solution may come from the kinetic models of SPA that use the “low-level” approach of mass action law describing each act of protein interactions. Some of these models were previously experimentally validated by us and others using Western blot analysis (Kholodenko et al., 1999; Kiyatkin et al., 2006; Birtwistle et al., 2007; Borisov et al., 2009; Kuzmina and Borisov, 2011). Our previous experience suggests that the two approaches can be used to estimate the importance of distinct genes/proteins in the pathways. One of them operates with the concept of sensitivity of the ordinary differential equation system with the free parameters (Kholodenko et al., 2003), which is generally applied to kinetic constants, but may be used for assperating with the protein concentrations in the kinetic model of a pathway (Kuzmina and Borisov, 2011), according to a formula,. Here w is the importance factor, [EFF(t)] is the time-dependent concentration of the active pathway effector protein (experimentally traced marker of a pathway activation), the upper integration limit T is the time of reaching the steady-state, and C is the total concentration for the protein j. Another way to calculate the importance factor for the gene products deals with the stiffness/sloppiness analysis of the effector activation (Daniels et al., 2008). This approach comprises analyzing the Hesse matrix, , where C is the vector of total concentrations for every protein in the pathway, [EFF (C, t)] is concentration of an active pathway effector protein at the time point t, [EFF]exp is the experimentally measured (e.g., by Western blots) total concentration of the effector at the same time, and σ is the experimental error for this measurement. The sloppiness/stiffness analysis looks for the eigenvalues, λ, and eigenvectors, ξ, for the Hesse matrix, Hξ = λ · ξ. The higher is the absolute value of λ, the “stiffer” is the direction within the n-dimensional space of C (where n is the number of protein types in the pathway model). The eigenvector components along with the stiffest direction, ξ, may be used for assessment of the importance factor w of a certain gene products in a pathway according to the formula: w(2) = |ξ|. Taking into account the above considerations, we come to the following final formula for assessing the SPA: PAS(1, 2) = ∑ · BTIF · w(1,2) · lg (CNR). Here the Boolean flag BTIF (beyond tolerance interval flag) indicates that the expression level for the gene n for the given sample is different enough from the respective expression level in the reference sample or set of reference samples. For this demonstration of our method we applied two simultaneous restriction/inclusion criteria to the expression of each individual gene: (i) 50% expression level cut-off rate compared to the average for the reference set, and (ii) the sample expression level should differ stronger than two standard deviations from the average of the reference set. We next explored the effect of the introduction of the importance factors w in calculating PAS compared to the simplified model of PAS evaluation lacking w. Importance factors were calculated using either sensitivity-based, w(1), or stiffness-based, w(2), algorithms. We performed this verification for the EGFR pathway, for which we established and published this model previously (Kuzmina and Borisov, 2011). For these two sets of the importance factors, and for the w-free model, we performed a computational analysis of nine transcriptomes established using microarray hybridization technology for human glioblastoma samples from the published datasets (Supplementary dataset 1). The information on SP organization was taken from the Web-based SABiosciences database. The data on ARR were manually curated by analyzing the same database. Our findings suggest that the cloud of values for the ratio (where PASEGFR is the PAS value for the EGFR pathway in the simplified model, where all importance factors equal to 1) lies within the interval of (0.6 ± 0.8), whereas the ratio belonged to the interval (1.0 ± 0.8). Overall, we conclude that for such a complex SP like EGFR which includes >300 gene products, incorporation of the importance factors had only a moderate effect on the PAS. This suggests that, in principle, the simplified formula for PAS calculation may be applied for the pathway analysis. For the overwhelming majority of the SPs, there is no experimental data available that makes impossible for them to calculate the importance factors using kinetic models. For them we performed the stochastic robustness analysis using the simplified formula for PAS. We introduced the additional random perturbation factor, w, which was used as the analog of importance factor for PAS evaluation. In our computational simulation, the distribution of w was logarithmically normal and calculated as follows: w = 2, where x were normally distributed random numbers with the expected value of M = 0 and standard deviation σ = 0.5. The random perturbation factors w were applied to the glioblastoma transcriptional dataset GSM215422 (GSM215422 dataset). Importantly, although the perturbation was done independently 98 times with independent weighting factors w, for each gene, the values of standard deviation for the set of alternate PAS (APAS) were nor big enough to mask the proportional trend between the average perturbed PAS and unperturbed PAS for each of the 68 SPs analyzed in this study (Figure 1; Supplementary dataset 2).

Figure 1

Values of pathway activation strength (. The pathway information was extracted from the SABiosciences database. Primary data are shown on the Supplementary dataset 3. For the perturbed values (APAS), both average values (points at the plot) and standard deviation bars are shown. We propose here a new biomathematical method, OncoFinder, for both quantitative and qualitative analysis of the intracellular SP activation. It can be used for the analysis of any physiological, stress, malignancy and other perturbed conditions at the molecular level. The enclosed mathematical algorithm enables processing of high-throughput transcriptomic data, but there is no technical limitation to apply OncoFinder to the proteomic datasets as well, when the developments in proteomics allow generating proteome-wide expression datasets. We hope that due to its universal applicability, the method OncoFinder will be widely used by the biomedical researcher community and by all those interested in thorough characterization of the molecular events in the living cells. We also want to encourage building international scientific partnership aimed at the standardized experimental characterization of the importance factors for individual proteins, starting at least with the SPs most relevant to the major aspects of human physiology.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

20 in total

1. Enzymatic production of RNAi libraries from cDNAs.

Authors: Daisuke Shirane; Kohtaroh Sugao; Shigeyuki Namiki; Mao Tanabe; Masamitsu Iino; Kenzo Hirose
Journal: Nat Genet Date: 2004-01-04 Impact factor: 38.330

2. Pathway studio--the analysis and navigation of molecular networks.

Authors: Alexander Nikitin; Sergei Egorov; Nikolai Daraselia; Ilya Mazo
Journal: Bioinformatics Date: 2003-11-01 Impact factor: 6.937

3. Untangling the wires: a strategy to trace functional interactions in signaling and gene networks.

Authors: Boris N Kholodenko; Anatoly Kiyatkin; Frank J Bruggeman; Eduardo Sontag; Hans V Westerhoff; Jan B Hoek
Journal: Proc Natl Acad Sci U S A Date: 2002-09-19 Impact factor: 11.205

4. Scaffolding protein Grb2-associated binder 1 sustains epidermal growth factor-induced mitogenic and survival signaling by multiple positive feedback loops.

Authors: Anatoly Kiyatkin; Edita Aksamitiene; Nick I Markevich; Nikolay M Borisov; Jan B Hoek; Boris N Kholodenko
Journal: J Biol Chem Date: 2006-05-09 Impact factor: 5.157

5. Ongoing and future developments at the Universal Protein Resource.

Authors:
Journal: Nucleic Acids Res Date: 2010-11-04 Impact factor: 16.971

6. An evaluation of human protein-protein interaction data in the public domain.

Authors: Suresh Mathivanan; Balamurugan Periaswamy; T K B Gandhi; Kumaran Kandasamy; Shubha Suresh; Riaz Mohmood; Y L Ramachandra; Akhilesh Pandey
Journal: BMC Bioinformatics Date: 2006-12-18 Impact factor: 3.169

7. SPIKE--a database, visualization and analysis tool of cellular signaling pathways.

Authors: Ran Elkon; Rita Vesterman; Nira Amit; Igor Ulitsky; Idan Zohar; Mali Weisz; Gilad Mass; Nir Orlev; Giora Sternberg; Ran Blekhman; Jackie Assa; Yosef Shiloh; Ron Shamir
Journal: BMC Bioinformatics Date: 2008-02-20 Impact factor: 3.169

8. Comparison and evaluation of pathway-level aggregation methods of gene expression data.

Authors: Seungwoo Hwang
Journal: BMC Genomics Date: 2012-12-13 Impact factor: 3.969

9. Systems-level interactions between insulin-EGF networks amplify mitogenic signaling.

Authors: Nikolay Borisov; Edita Aksamitiene; Anatoly Kiyatkin; Stefan Legewie; Jan Berkhout; Thomas Maiwald; Nikolai P Kaimachnikov; Jens Timmer; Jan B Hoek; Boris N Kholodenko
Journal: Mol Syst Biol Date: 2009-04-07 Impact factor: 11.429

10. Ligand-dependent responses of the ErbB signaling network: experimental and modeling analyses.

Authors: Marc R Birtwistle; Mariko Hatakeyama; Noriko Yumoto; Babatunde A Ogunnaike; Jan B Hoek; Boris N Kholodenko
Journal: Mol Syst Biol Date: 2007-11-13 Impact factor: 11.429

49 in total

1. Chemo brain: From discerning mechanisms to lifting the brain fog-An aging connection.

Authors: Anna Kovalchuk; Bryan Kolb
Journal: Cell Cycle Date: 2017-06-28 Impact factor: 4.534

2. SMAD4 Loss Is Associated with Cetuximab Resistance and Induction of MAPK/JNK Activation in Head and Neck Cancer Cells.

Authors: Hiroyuki Ozawa; Ruchira S Ranaweera; Evgeny Izumchenko; Eugene Makarev; Alex Zhavoronkov; Elana J Fertig; Jason D Howard; Ana Markovic; Atul Bedi; Rajani Ravi; Jimena Perez; Quynh-Thu Le; Christina S Kong; Richard C Jordan; Hao Wang; Hyunseok Kang; Harry Quon; David Sidransky; Christine H Chung
Journal: Clin Cancer Res Date: 2017-05-18 Impact factor: 12.531

3. Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data.

Authors: Nicolas Borisov; Maria Suntsova; Maxim Sorokin; Andrew Garazha; Olga Kovalchuk; Alexander Aliper; Elena Ilnitskaya; Ksenia Lezhnina; Mikhail Korzinkin; Victor Tkachev; Vyacheslav Saenko; Yury Saenko; Dmitry G Sokov; Nurshat M Gaifullin; Kirill Kashintsev; Valery Shirokorad; Irina Shabalina; Alex Zhavoronkov; Bhubaneswar Mishra; Charles R Cantor; Anton Buzdin
Journal: Cell Cycle Date: 2017-08-21 Impact factor: 4.534

4. A method of gene expression data transfer from cell lines to cancer patients for machine-learning prediction of drug efficiency.

Authors: Nicolas Borisov; Victor Tkachev; Maria Suntsova; Olga Kovalchuk; Alex Zhavoronkov; Ilya Muchnik; Anton Buzdin
Journal: Cell Cycle Date: 2018-01-17 Impact factor: 4.534

5. Early stage of cytomegalovirus infection suppresses host microRNA expression regulation in human fibroblasts.

Authors: Anton A Buzdin; Alina V Artcibasova; Natalya F Fedorova; Maria V Suntsova; Andrew V Garazha; Maxim I Sorokin; Daria Allina; Mikhail Shalatonin; Nikolay M Borisov; Alex A Zhavoronkov; Igor Kovalchuk; Olga Kovalchuk; Alla A Kushch
Journal: Cell Cycle Date: 2016-12-16 Impact factor: 4.534

6. MiRImpact, a new bioinformatic method using complete microRNA expression profiles to assess their overall influence on the activity of intracellular molecular pathways.

Authors: Alina V Artcibasova; Mikhail B Korzinkin; Maksim I Sorokin; Peter V Shegay; Alex A Zhavoronkov; Nurshat Gaifullin; Boris Y Alekseev; Nikolay V Vorobyev; Denis V Kuzmin; Аndrey D Kaprin; Nikolay M Borisov; Anton A Buzdin
Journal: Cell Cycle Date: 2016 Impact factor: 4.534

7. Screening and personalizing nootropic drugs and cognitive modulator regimens in silico.

Authors: Leslie C Jellen; Alexander Aliper; Anton Buzdin; Alex Zhavoronkov
Journal: Front Syst Neurosci Date: 2015-02-06

8. The OncoFinder algorithm for minimizing the errors introduced by the high-throughput methods of transcriptome analysis.

Authors: Anton A Buzdin; Alex A Zhavoronkov; Mikhail B Korzinkin; Sergey A Roumiantsev; Alexander M Aliper; Larisa S Venkova; Philip Y Smirnov; Nikolay M Borisov
Journal: Front Mol Biosci Date: 2014-08-26

9. Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data.

Authors: Alexander Aliper; Sergey Plis; Artem Artemov; Alvaro Ulloa; Polina Mamoshina; Alex Zhavoronkov
Journal: Mol Pharm Date: 2016-06-08 Impact factor: 4.939

10. Quantifying signaling pathway activation to monitor the quality of induced pluripotent stem cells.

Authors: Eugene Makarev; Kristen Fortney; Maria Litovchenko; Karl H Braunewell; Alex Zhavoronkov; Anthony Atala
Journal: Oncotarget Date: 2015-09-15