Literature DB >> 33909629

Bioinformatic analysis linking genomic defects to chemosensitivity and mechanism of action.

Abstract

A joint analysis of the NCI60 small molecule screening data, their genetically defective genes, and mechanisms of action (MOA) of FDA approved cancer drugs screened in the NCI60 is proposed for identifying links between chemosensitivity, genomic defects and MOA. Self-Organizing-Maps (SOMs) are used to organize the chemosensitivity data. Student's t-tests are used to identify SOM clusters with enhanced chemosensitivity for tumor cell lines with versus without genetically defective genes. Fisher's exact and chi-square tests are used to reveal instances where defective gene to chemosensitivity associations have enriched MOAs. The results of this analysis find a relatively small set of defective genes, inclusive of ABL1, AXL, BRAF, CDC25A, CDKN2A, IGF1R, KRAS, MECOM, MMP1, MYC, NOTCH1, NRAS, PIK3CG, PTK2, RPTOR, SPTBN1, STAT2, TNKS and ZHX2, as possible candidates for roles in chemosensitivity for compound MOAs that target primarily, but not exclusively, kinases, nucleic acid synthesis, protein synthesis, apoptosis and tubulin. These results find exploitable instances of enhanced chemosensitivity of compound MOA's for selected defective genes. Collectively these findings will advance the interpretation of pre-clinical screening data as well as contribute towards the goals of cancer drug discovery, development decision making, and explanation of drug mechanisms.

Entities: CellLine Chemical Disease Gene Mutation Species

Year: 2021 PMID： 33909629 PMCID： PMC8081165 DOI： 10.1371/journal.pone.0243336

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The emergence of extensive human tumor cell line compound screening data, coupled with advances in cancer genomic technologies, has generated comprehensive and complex databases [1]. Strategies for analyzing this data may identify important links between genetic changes that contribute to the hallmarks of cancer biology [2] and the discovery of leads in the pursuit of small-molecule cancer therapy [3]. The present report examines links between genetically defective genes in the National Cancer Institute’s panel of sixty tumor cell lines (referred to hereafter as NCI60), chemosensitivity, as measured by growth inhibition (GI50NCI60; adopting the convention of an under bar to describe the vector of GI50NCI60 (N = 59) measurements for each screened compound) and preferences for mechanisms of action (MOA) of identified linkages. An elegant study by Ikediobi et al. [4] addressed this goal by examining relationships between mutations in 24 cancer genes in the NCI60 tumor cell lines and the GI50NCI60 activity of ~8k screened compounds. Their finding of a strong association between the BRAF mutation (V600E) and the GI50NCI60 activity of phenothiazines supports important links between altered genes, chemosensitivity and MOAs. The current analysis extends this work, with significant differences. GI50NCI60 results for ~53k screened compounds are analyzed (DTP database). A larger set of gene mutations (N = 368) for the NCI60 tumor cell lines are analyzed (CBioPortal database). A novel analysis of GI50NCI60, based on Self-Organizing-Maps (SOMs), emphasizing FDA approved compounds with assigned MOAs in the NCI60 screened compounds, is used to derive links between tumor cell chemosensitivity, genetically altered genes and MOAs. Efforts to develop links between pre-clinical tumor cell screening data, genomic defects and drug mechanisms may contribute to advances in small-molecule cancer therapies. An important element of these efforts requires more informed interpretations of small-molecule screening results in the context of genomic profiles and drug action. These associations may yield undiscovered opportunities for drug re-purposing and new applications of gene mutations towards personalized medicine.

Data

Three publicly available data sources are used for this analysis. First, chemosensitivity data consists of the 2019 release of GI50NCI60 measurements from the Developmental Therapeutics Program (DTP) in the National Cancer Institute. Historically the NCI60 screen was designed to identify relationships between chemotypes and cellular responses [5]. Their bulk download(https://dtp.cancer.gov/discovery_development/nci-60) includes GI50 values for 159 tumor cell types. A subset of 70 tumor cell lines, identified previously [6] as representing an information-rich component of this data, consists of ~53k screened compounds, which for this analysis was reduced to 46,798 GI50NCI60 records when filtered for a coefficient of variation above 0.1. Z-score normalized GI50NCI60 measurements of the filtered data (~46K) were used for chemosensitivity analysis. The raw data file is included in the S1 master_appendix sheet GI50. Second, genetic data is obtained from the cBioPortal database (https://www.cbioportal.org/) [7,8]. A total of 368 altered genes are listed for the NCI60; with either a mutation (MUT), copy number alteration (CNA) or fusion/splice (FUSION). These genomic changes are grouped, so that a gene alteration due to any or all types of variations will be designated as genetically defective. In this context, a defective gene indicates only a modification from the wild-type allele. Genes designated as defective genes can have wide-ranging effects including gain-of and/or loss-of gene function. Defective genes occur within each NCI60 tumor cell individually or as pairs, doublets, triplets, etc. S1 Appendix in S1 File displays a histogram for the frequency of defective genes within the NCI60. The highest frequency exists for tumor cell lines having a single defective gene. This frequency decreases progressively down to less than one percent for tumor cell lines sharing 10 defective genes. The cumulative frequency of tumor cell lines sharing any defective gene is 0.97, an indication that the probability of tumor cell lines sharing any defective gene is approaches one. S2 Appendix in S1 File displays the histogram of defective genes shared between tumor cell lines. These results find that shared defective genes, comprised of doublets and triplets are more common compared to the appearance of only a single defective gene (consistent with Ikediobi et al. [4]). S3 master_appendix sheet appendix_table_I lists the singlets, doublets and triplets of defective genes observed in the NCI60. S4 master_appendix sheet appendix_table_II summarizes their counts. Inspection finds a diverse set of defective genes, some of which are not considered to have important roles in cancer. CDC25A, TP53, CDKN2A, CDKN2B, MYC, BRAF, EP300, KRAS, NOTCH1 and PTK2 are the top ten most frequently occurring defective genes. To summarize, defective genes appearing as doublets or triplets finds these top ten defective genes to appear in combination with themselves and other genes. Collectively these results indicate that shared defective genes, with diverse roles in cellular biology, are common within the NCI60. Third, CellMiner [9] (https://discover.nci.nih.gov/cellminer/home.do) provides information about mechanism of action (MOA) for Food and Drug Administration (FDA) approved compounds. CellMiner reports 270 FDA compounds with unique NSC (National Service Center) and Name assignments that have been screened in the NCI60 (ca. 2019). One-hundred and ninety FDA screened compounds appear in the 46,798 GI50NCI60 responses. One-hundred and four MOA assignments exist for this set of compounds. These assignments consist of a primary MOA designation followed by secondary MOAs. For example the assignment BCR-ABL|YK,FYN,LYN indicates BCR-ABL at the primary MOA, with YK (tyrosine kinase), FYN and LYN (both Proto-Oncogene, Src Family Tyrosine Kinases) as secondary MOAs. Thirty primary MOAs are assigned to this data. The complete set of MOAs for FDA screened compounds is listed in S1 S5 master_appendix sheet appendix_table_III. Seven of the most frequent primary MOA classes spanning this data function to target tubulin:Tu, topoisomerase 2:T2, topoisomerase 1:T1, alkylation:A (A2: Alkylating_at_N-2_position_of_guanine, A6: Alkylating_at_O-6_of_guanine, A7: Alkylating_at_N-7_position_of_guanine, AlkAg: Alkylating agent and anti-metabolites:AM), DNA:D (Db:DNA_binder, DDI/R, DNA_damage_repair/inducer, Df:antifols, Ds: DNA_synthesis_inhibitor), kinases:PK and apoptosis:Apo. MOA:PK consists of over 100 kinase targets. FDA compounds screened in the NCI60, and their assigned CellMiner MOA, will be used for the linking MOA to chemosensitivity.

METHODS: Data clustering

The methods for linking chemosensitivity, defective genes and MOA apply a sequential, multi-tiered approach. First, the GI50NCI60 data is organized into clusters. Many statistical tools are now available for clustering GI50NCI60 data [10]. Relying on our prior analysis [6], the results presented here use Self-Organizing-Maps (SOMs) [11,12]. Parameters from prior SOM analyses are selected for clustering (hexagonal nearest neighbors, Epanechnikov Function kernal [13]). SOM dimensions are based on a heuristic using the ratio of the first and second principal components of the data. The y-axis dimension is calculated as round(sqrt(munits/ratio*sqrt(0.75))), where munits = 5*nsamples^0.543. The x-axis dimension is calculated as round(munits/y-dimension). This heuristic is derived from the developers of SomPak, based on their usage. The sqrt(0.75) multiplier is explained as follows, “in the hexagonal lattice, the side lengths are not directly proportional to munits (= 5*nsamples^0.543) since the units on the y-axis are squeezed together by a factor of sqrt(0.75)”. Applying this procedure yields SOM map dimensions of 44 rows and 28 columns. Each of these 1232 SOM nodes defines a vector representing the average GI50NCI60 for all compounds clustered within each SOMDTP node (referred to hereafter as a node’s GI50codebook). S2 master_appendix sheet SOM_codebook lists the 1232 GI50codebooks. Each compound’s SOMDTP node will be referred to as its projection. The best projection can be extended to include the 2nd, 3rd, 4th, etc. SOMDTP nodes to determine whether a compound’s next best projections appear as SOMDTP neighbors. Prior analyses found GI50codebook patterns to be associated with a compound’s MOA (e.g. alkylating agents, tubulin targeting agents, DNA/RNA damaging agents and agents affecting mitochondrial function [6]). Analysis of GI50codebook patterns has also been proposed for use in the development of clinical strategies based on differentially expressed molecular targets within classes of tumors [14,15]. Other applications include the recent identification of unique GI50codebook patterns within the NCI60 renal subpanel as the basis for further testing of the natural product-derived family of englerins [16]. While each SOMDTP node represents a cluster of GI50NCI60 values, a more global, lower resolution representation, that optimally groups SOMDTP nodes into meta-clades, is proposed. Three state-of-the art procedures are used to determine the optimal number of meta-clades; the elbow method minimizes the within-cluster sum of squares, WSS (a measure of within cluster similarity) and maximizes the between-cluster sum of squares (a measure of how separated each cluster is from the others), the silhouette method computes the average silhouette of observations for different numbers of clusters (selecting the optimal cluster size that maximizes the average silhouette) and the gap_statistic method [17] determines the total within intra-cluster variation for different numbers of clusters (selecting the cluster size that maximizes the gap_statistic). displays the results for these three methods applied to GI50codebooks. The optimal cluster size is indicated by the vertical lines in each plot. The elbow and gap methods rely on an inflection point on each curve for optimal cluster size, while the silhouette method seeks the largest value for silhouette width. Panel A displays WSS as circles for the elbow method and the first derivative of WSS, normalized by the local average WSS, as triangles. The maximum value of the derivative of WSS occurs for 28 clades. The silhouette method yields a maximum value at 26 clusters (cf. Panel B). The criterion for optimal cluster size using the gap statistic seeks the smallest number of clusters such that the gap statistic is within one standard deviation of the next gap statistic: Gap(k)≥Gap(k + 1)−sd(k+1) (displayed as triangles in Panel C), yields 28 clusters as optimal. Panel D displays the GI50codebook cluster dendrogram using the cutree tool [18] to group the dendrogram into 28 meta-clades (red lines) and 7 meta-clades (green lines). Based on these results a value of 28 was selected for the optimal number of meta-clades used in this analysis. The rationale for cutting the dendrogram at 7 clusters will be provided later in the analysis of MOAs.

Results for selecting optimal cluster size using the elbow method (Panel A), silhouette method (Panel B) and the gap_statistic method (Panel C).

Selection for the elbow method is based on the largest local derivative of the within groups sum of squares (Panel A triangles), the maximum silhouette width (Panel B) and the first non-negative value for Gap(k)-(Gap(k + 1)−sdk+1) (Panel C triangles). These results indicate an optimal number of clusters in the 26–28 range. Panel D displays the GI50codebook dendrogram (Euclidean, Ward’s) with cuts at 28 (red lines) and 7 clusters (green lines), respectively. A visual perspective of SOMDTP and the 28 meta-clades appears in . The 1232 GI50codebooks appear as a clustered heatmap (Euclidean,Ward’s) in Panel A. The dendrogram at the left edge of the heatmap, displays the dendrogram appearing in . The vertical ribbon, adjacent to this dendrogram, colored spectrally from blue to red, represents the subdivision of the hierarchal clade tree into 28 meta-clades. The pvclust utility [19], using random resampling, confirms this set of 28 meta-clades with a confidence p-value above 0.995 across resampling (n = 1000 resamples). Panel B in displays the 28 meta-clades on SOMDTP, colored according to the spectral-colored vertical ribbon in the left panel. The data reduction of 1232 SOMDTP nodes to 28 meta-clades yields a lower resolution, more manageable, perspective of the complete SOMDTP. The gray ribbon in Panel B displays the dendrogram cut into 7 major groups. Groupings consist of A: meta-clades 1–6, B: meta-clades 7–9, C: meta-clades 10–15, D: meta-clades 16–18, E: meta-clades 19–20, F: meta-clades 21–24 and G: meta-clades 25–28. The vertical grayscale colored bar in displays these seven groupings from A(bottom:black) to G(top:light gray). SOMDTP meta-clades will be assessed according to the differential chemosensitivity of tumor cell lines with and without defective genes.

Panel A displays a heatmap of GI50codebook, colored spectrally from green(chemoinsensitive) to red(chemosensitive) response.

Dendrogram at the left represents hierarchical clustering (Euclidean, Ward’s) of GI50codebooks (reproduced from ). Panel B displays the SOMDTP colored according to hierarchical cutree [18] specified at the optimal number of 28 meta-clades. The 28 colors appear spectrally from meta-clade 1 (dark blue), at the bottom of the hierarchical dendrogram, to meta-clade 28 (dark red), at the top of the hierarchical dendrogram. Grayscale bar adjacent to the 28 meta-clade spectrally colored bar displays the 7 meta-clades groupings. The NCI60 tumor cell lines clustered in the heatmap are ordered, left to right, as: SK.OV.3.Ovarian, NCI.H322M.Lung, DU.145.Prostate, A549.ATCC.Lung, HOP.62.Lung, OVCAR.5.Ovarian, TK.10.Renal, EKVX.Lung, A498.Renal, NCI.H226.Lung, SK.MEL.28.Melanoma, SK.MEL.2.Melanoma, BT.549.Breast, UACC.257.Melanoma, MALME.3M.Melanoma, SN12C.Renal, OVCAR.8.Ovarian, NCI.H23.Lung, IGROV1.Ovarian, MDA.MB.231.ATCC.Breast, OVCAR.4.Ovarian, CAKI.1.Renal, ACHN.Renal, UO.31.Renal, HS.578T.Breast, RXF.393.Renal, T.47D.Breast, HOP.92.Lung, HCC.2998.Colon, HT29.Colon, COLO.205.Colon, NCI.H460.Lung, KM12.Colon, PC.3.Prostate, OVCAR.3.Ovarian, M14.Melanoma, MDA.MB.435.Breast, UACC.62.Melanoma, SK.MEL.5.Melanoma, SW.620.Colon, HCT.15.Colon, HCT.116.Colon, LOX.IMVI.Melanoma, MCF7.ATCC.Breast, MCF7.Breast, NCI.H522.Lung, K.562.Leukemia, RPMI.8226.Leukemia, HL.60.TB..Leukemia, SR.Leukemia, MOLT.4.Leukemia, CCRF.CEM.Leukemia. Noteworthy is the mapping of the 28 cutree clades to discontinuous SOMDTP regions. Ideally, cutree clades might appear as contiguous regions the 2-dimensional SOMDTP. However, this is not the case. To obtain contiguous SOMDTP regions, an alternative hierarchical clustering algorithm would need to be used that only combines adjacent dendrogram clades that appear beside each other on SOMDTP. Although not pursued here, assigning contiguous SOM regions is an active area of research in dimensionality reduction [20], with specific focus on representing SOMs in one dimension [21]. Many of these efforts use randomized resampling to identify contiguous map regions by consensus. Usually standard hierarchical clustering suffices, and any outlying (noncontiguous) points can be accounted for manually. Towards this end, SOMDTP singletons, appearing as a hierarchical clade that maps to SOMDTP as a node without the same meta-clade neighbors, have been replaced by their neighborhood meta-clade assignments. There are 12 such cases (0.0097 = 12/1232) for this data set. An additional consideration for non-contiguous SOMDTP meta-clades may result from the assignment of distances used for clustering (Euclidean for hierarchical clustering and Epanechnikov Function [13] for SOMs). Our choice of the Epanechnikov Function for SOM clustering consistently yielded the lowest SOMDTP quantization errors [6]. However, a more likely explanation for non-contiguous SOMDTP meta-clades involves differences in clustering methodology. SOMs organize data by mapping each cluster to its most similar neighbors (six in the case of hexagonal mapping); whereas hierarchical clustering, as used to obtain heatmap dendrograms, builds each branch of the dendrogram by pairwise associations. The failure to map hierarchical clustering methods directly to contiguous SOMDTP regions is not unexpected and points more to the limitations of hierarchical methods to match non-hierarchical methods, regardless of distance metrics.

METHODS: Identification of SOMDTP nodes with enhanced chemosensitivity

SOMDTP nodes are analyzed for enhanced chemosensitivity of tumor cell lines with versus without defective genes. Each GI50codebook is divided into subsets comprising tumor cell lines with (GI50defective) and without (GI50wild-type) a defective gene. A Student’s t-test is used to identify cases of relatively higher chemosensitivity for GI50defective versus GI50wild-type. SOMDTP nodes with Student’s p-values less than or equal to 0.05 were further assessed for statistical significance by bootstrap resampling [22,23]. Each node’s GI50codebook was randomly shuffled and a Student’s t-test performed, while maintaining the tumor cell’s wild-type and defective gene status. One-thousand trials were conducted for each GI50codebook and a p-value was estimated by counting the number of times the shuffled p-value was less than the initial, unshuffled, p-value. Dividing this value by 1000 yields an estimate for the probability of the observed p-value occurring by chance. SOMDTP nodes with measured p-values less than 0.05 and below their estimated chance occurrence were accepted for further analysis. Sixty-five percent (65%, n = 635) of the 1232 SOM nodes pass this criterion and account for 121 defective genes. summarizes the results for GI50codebook at SOM1,13 (subscripting refers to the SOM node, i.e. SOMrow,column). Five NCI60 tumor cell lines have the defective ABL1 gene; with these tumor cell lines having a mean GI50defective response nine-fold higher than GI50wild-type (p = 6.91e-3). Panel A in displays GI50codebook, ordered from most chemosensitive to least chemosensitive values. NCI60 tumor cell lines with the ABL1 alteration, highlighted in red and representing GI50defective, are ranked at positions 3, 6, 8, 25 and 51. SOMDTP can also be viewed according to each NCI60 tumor cell (referred to as GI50component). Panel B of displays GI50component for each of the 5 tumor cell lines with defective ABL1. Regions of greatest and least chemosensitivity for each tumor cell are displayed spectrally from red to blue, respectively. Noteworthy is the location of Gleevec chemosensitivity to the most sensitive (e.g. red) GI50component SOMDTP regions for K-562, RMPI-8226 and HS578T.

Panel A displays GI50codebook for SOM1,13, ordered from most to least chemosensitivity.

The 5 tumor cell lines with the defective ABL1 gene appear as red bars. Panel B displays GI50component for the 5 tumor cell lines with defective ABL1. SOMDTP nodes are colored spectrally from highest chemosensitivity (red) to lowest chemosensitivity (blue). Panel A of projects onto the SOMDTP the Students t-statistic for tumor cell lines with defective ABL1; where the t-statistic values are colored spectrally from low(blue) to high(red) significance. SOMDTP nodes without statistical significance (p>0.05) are not colored. The most significant t-statistics for defective ABL1 are located mainly in SOM meta-clades 1, 10, 14 and 26. Gleevec appears as the most significant SOM5,15 node in meta-clade 14. For comparison, the results for KRAS are projected in Panel B of . There are 12 tumor cell lines (A549/ATCC-Lung, CCRF-CEM-Leukemia, HCC-2998-Colon, HCT-116-Colon, HCT-15-Colon, HOP-62-Lung, NCI-H23-Lung, NCI-H460-Lung, OVCAR-5-Ovarian, RPMI-8226-Leukemia, SK-OV-3-Ovarian and SW-620-Colon) harboring defective KRAS, with significant chemosensitive SOMDTP nodes appearing in meta-clades 21, 22 and 27. SOM meta-clade 21 is the location of the FDA compound cytarabine (ara-C) and is consistent with the conclusion of Ahmad et al [24] that adult AML patients carrying defective KRAS benefit from higher ara-C doses more than wt KRAS patients.

Panels A and B display significant chemosensitive SOMDTP nodes (projected as their t-statistic from a Student’s t-test; blue:least, red:most significant) for tumor cell lines with defective ABL1 and KRAS, respectively.

Panel C displays the 28 SOMDTP meta-clades. S3 and S4 Appendix Figures and in S1 File display additional examples for the defective genes PIK3RI and IGF1R, respectively. PIK3R1 (Phosphatidylinositol 3-Kinase Regulatory Subunit Alpha) and a related gene, PIK3CA (PI3-Kinase Subunit Alpha) are lipid kinases capable of phosphorylating the 3’OH of the inositol ring of phosphoinositides. Both are responsible for coordinating a diverse range of cell functions including proliferation and survival. Defective PIK3CA has been documented by Whyte and Holbeck [25] to enhance tamoxifen sensitivity in selected NCI60 tumor cell lines. The results here also find chemosensitivity in NCI60 tumor cell lines having defective PIK3R1. The second example of defective IGF1R supports the importance of evaluating drug sensitivity for compounds targeting leukemia cell lines [26] and the emergence of IGF1R as a potential therapeutic target for the treatment of different types of cancer including plasma cell myeloma, leukemia, and lymphoma [27]. Both examples illustrate potential role of defective genes in chemosensitivity. The Students t-statistic represents the significance when comparing the chemosensitivity of a SOMDTP node for tumor cell lines with, versus without, defective genes. Parametric tests, such as the Student’s t-test, are applicable over non-parametric tests (Wilcoxon/Whitney/Mann, Kruskal-Wallis) when the underlying sample distribution is known and normal. The data analyzed here represents a strongly normal distribution (p < 0.001, lognormal test) with small deviations at the tails from a linear log normal quantile-quantile plot; supporting the use of a parametric statistic. The application of a bootstrap procedure to cases with a significant Student’s t-test is applied as a correction against Type I error for the following reasons. First, a bootstrap method can be used to estimate the sampling distribution of GI50codebook for each SOMDTP node. This test utilizes the node’s codebook vector as the initial sample representative and applies a bootstrap procedure to estimate the sampling distribution. Since 1000 samples were used, the p-value estimate corrects the empirical estimate using a division by 1000. This correction parallels multiple test corrections for traditional statistics [28]. Second, in this design there are 1000 statistical tests performed for each of the 1232 SOMDTP nodes. An important caveat of jointly using GI50codebook to create some type of correction for multiple test is their lack of independence [11,12]. This non-independence is disallowed when applying a Bonferroni, Holm or Benjamini-Hochberg [29] corrections. None-the-less, the long-standing debate continues to exist about bootstrap applications, possible overestimation of ‘true’ values, and appropriate corrections for random noise [30].

Methods: Mapping MOA to SOMDTP

SOMDTP projections for the most frequent primary CellMiner MOA assignments (Tu, T1, T2, A, D, Apo and PK) are displayed Panel A of . Panel B displays the histogram of SOMDTP node counts for these MOA assignments. Inspection indicates that MOA classes A, D, T1 and T2 appear mainly in the upper right SOMDTP region (SOM meta-clade 21; Group A), while MOA Apo appears mainly in the upper left region (SOMDTP meta-clades 25 and 26; Group G). Tu compounds are found mainly in SOMDTP meta-clades 16, 17 and 18 (Group D). SOMDTP meta-clade 19(Group E) consists of only MOA PK; while MOA PK compounds are in the majority for SOMDTP meta-clades 1 through 6(Group A). The horizontal gray scale bar at the bottom of the right panel identifies the seven meta-clade groups assigned earlier (cf. Figs and ). Inspection indicates relative similarities of MOA types within each of the seven meta-clade groups A:G. Notable is the majority representation of MOA:PK in Group A and MOA:Tu in Group D. Detailed results for MOAs across meta-clade groups will be presented later.

Panel A: SOMDTP projections for FDA approved compounds for the primary CellMiner assigned MOAs.

Projections include the top 10th percentile of SOMDTP nodes for each compound. Panel B: histogram of the counts for these primary MOAs across SOM meta-clade groups. Primary MOAs appear color-coded in each vertical bar, with their heights corresponding to MOA counts in each meta-clade. Horizontal grayscale bar below Panel B indicates meta-clade groups A:G (reproduced from ). lists the most frequent primary MOAs for meta-clade groups A:G. These results segregate the primary MOAs into separate regions of the cutree = 7 hierarchical dendrogram (gray bar in and ); MOA:PK appears at the bottom, MOAs targeting DNA and Apo appear at the top and mixtures of primary MOAs appear in the middle. These meta-clade grouping will be analyzed in greater detail for links of MOAs to defective genes.

Methods: Enrichment of MOA

Fisher’s exact and chi-square tests are used to identify cases where the SOMDTP projections of defective genes are statistically enriched in co-projections of MOA types. These tests are useful for categorical data that result from classifying objects in two different ways; and are used to determine a statistical measure for the random likelihood of the intersection of each classification. For each defective gene the number of SOMDTP nodes with significant Student’s t-statistics are determined (Ngene). FDA approved compounds that co-project to Ngene determine a unique set of MOA’s associated with each defective gene (MOAgene). All FDA compounds that share any MOAgene are collected (NFDA); where the 10th best FDA projections are included in the count. Extending FDA projections beyond only the best node achieves two goals. First, it establishes SOMDTP regions rather than individual nodes for MOA classification. Second, increasing the numbers in the contingency table extends significance testing to include Fisher’s exact and the chi-square testing. The contingency table entries become; p11 = intersection(Ngene,NFDA), p12 = Ngene−p11, p21 = NFDA−p22 and p22 = p11 by default to conserve equal row and column sums. illustrates a sample result of the steps for calculating the Fisher’s exact statistic using ABL1. Panel A (reproduced from ) displays the significant SOMDTP nodes for defective ABL1, where Ngene = 48. Collecting the MOA’s for the 11 FDA compounds co-projected to Ngene finds 6 MOAs (Apo, Ho, HSP90, NonCan, PK and BCR-ABL). Panel B in displays the top 10th percentile of all FDA compounds sharing any one of these 6 MOAs, to yield NFDA = 189. Completing the contingency table with their intersection (22, results in a Fisher’s exact score of 1.958262e-09 (logpval(-20.051208)).

Panel A displays the significant SOMDTP nodes for ABL1 (Ngene = 48).

Eleven FDA compounds are co-projected to Ngene; yielding 6 MOAs. The SOMDTP in Panel B displays the top 10th percentile of projections for FDA compounds sharing these MOAs (NFDA = 189). The intersection of Ngene and NFDA = 22, yielding a Fishers exact p-value of 1.958262e-09, log(p-value = -20.05). displays a bar chart of the log(pvalue) for the 47 genes with significant (p< = 0.05) Fisher’s exact scores, when tested over the complete SOMDTP. Defective genes with the top-most significance scores include MTOR, SPEN, CDC25A, PTK2, TEK, CYP11B1, CYP11B2, NCOR2, PIK3CG, MSH2, APC and MMP1. Note that this subset of defective genes is not exclusively associated with human cancers. Fisher’s exact and chi-square tests for MOAs will be applied to meta-clade groups (A-E). The average log(pvalue) for both statistics will be reported as a contingency score.

Fisher’s exact scores (log(pvalue), pvalue< = 0.05).

Results are based on classifications using up to the 10th best SOM projection nodes for FDA compounds. Forty-seven defective genes have significant Fisher’s exact scores when tested over the complete SOMDTP.

Results

The multi-tiered approach described in the METHODS builds a framework to achieve the study’s goal of associating chemosensitivity, defective genes and MOA. In summary: chemosensitivity data(n = 46k, GI50NCI60) is clustered as SOMDTP (n = 1232, GI50codebooks), subdivided into meta-clades (n = 28) and 7 meta-clade groups (A-G). Defective genes (n = 368) are analyzed according to significant chemosensitivity on SOMDTP (n = 121, Student’s t/bootstrap) and enrichment for type of MOA of defective genes (n = 47 genes, contingency score; reported as the average log(Fisher’s exact and chi-square scores). Contingency scores will be used to identify significant MOA enrichments for defective genes across meta-clade groups (A-G). The results for SOMDTP clustering will be presented first, followed by the results for MOA enrichments in groups A-G.

Results: SOMDTP

displays SOMDTP, colored according to similarity of neighboring GI50codebooks; where the most similar GI50codebook neighbors are displayed in deep red and the most dis-similar GI50codebook neighbors appear in bright yellow. The 28 optimal meta-clade boundaries are displayed as a black line, with the boundaries of the 7 meta-clade groups super-imposed as a white line. Two distinctive features characterize SOMDTP. First, the best projections of FDA approved compounds appear as blue hexagons in , where hexagons are sized according to the number of FDA agents appearing in any SOMDTP node. Inspection finds a general tendency for approved agents to project to SOMDTP nodes with unique patterns (e.g. dissimilar GI50codebooks). Statistical support for this observation is displayed in in the form of histograms based on intra-node GI50codebook distances for nodes containing FDA approved agents (top histogram) and lacking FDA approved agents (bottom histogram). A Student’s t-test for the vector distance between these two groups finds a p-value of 5.3e-7, in support of the visual association of FDA compounds and unique (e.g dis-similar) chemosensitivity patterns. Second, compound names for FDA screened agents are listed as a table in and projected on to SOMDTP in . A listing of these nodes and their SOMDTP projections also appears in S5 master_appendix sheet appendix_Table_III. In brief, FDA compounds with known MOAs are grouped together, with, for example, nucleic acid targeting agents appearing in the upper right corner of SOMDTP (meta-clade 21), tubulin targeting agents (meta-clades 16 and 18) and defective BRAF targeting agents (meta-clade 19). Collectively, these results support our prior report [6] of associations between NCI60 screened compounds, their MOAs and projections on SOMDTP.

Panel A.

SOMDTP is colored according to similarity of GI50codebooks, where the most similar node neighbors are displayed in deep red and the most dis-similar node neighbors appear in bright yellow (see vertical bar adjacent to SOMDTP). The 28 optimal meta-clade boundaries are displayed as a black line, with the boundaries of the 7 meta-clade groups super-imposed as a white line. FDA approved compounds are projected onto SOMDTP as blue hexagons, where hexagons are sized according to the number of FDA agents appearing in any node. Panel B displays the between node GI50codebook Euclidean distances for nodes with FDA compound projections (top) and without (bottom). Panel C lists FDA compound names grouped by 28 meta-clades. Panel D displays SOMDTP with FDA compounds (blue hexagons), meta-clade boundaries (solid lines) and meta-clade labels as numbers. FDA approved projections to SOMDTP nodes are listed in S5 master_appendix sheet appendix_Table_III.

Results: Group A (meta-clades 1 through 6)

The results for SOM meta-clade group A find twelve defective genes with significant contingency scores (ABL1, ACVR2A, CDC25A, MMP1, MTOR, NCOR2, NF1, PIK3R1, RB1, RPTOR, SOX9 and ZHX2) associated with eleven MOA classes (PK, Ang, Ho, PARP, AM, BCR-ABL, NonCan, Db, HDAC, HYP and Pase). displays the contingency scores, ordered left to right, from the most to least significance. displays the SOMDTP projections for these significant defective genes. Projections, colored according the legend, represent instances where significant Student’s t-statistics co-project with compounds having these MOAs. For example, ABL1 projections (blue) appear mainly in the lower right region. Color coding is unique for all defective genes analyzed herein; intended to provide a visual separation for each defective gene. lists the SOMDTP node counts, ordered from top to bottom and left to right. displays a histogram for these counts.

Panel A displays the contingency scores, ordered left to right, from the most to least significance.

The horizontal dashed lines represent significance thresholds of p< = 0.05 (lower line) and p< = 0.1 (upper line). Panel B displays the SOMDTP co-projections of significant defective genes and MOAs for FDA compound. Only co-occurrences for SOMDTP projections of FDA compounds are displayed. The SOMDTP region displayed in Panel B represents the boundary for meta-clades 1 through 6 (see the white border in ). Panel C lists the counts for co-occurrence (see S6 master_appendix sheet gp_A). Panel D displays the tabular results in Panel C as a histogram. Node colors for defective genes correspond to the legend inserted into the upper left panel. The counts displayed in Panel C represent the top 10th percentile of SOMDTP co-projections for FDA compounds. A consistent coloring scheme is used for this and all subsequent figures, such that all defective genes presented in the RESULTS are assigned a unique color. S13 master_appendix_sheet gp_A_FDA list the counts for each FDA and MOA entry for these significant genes. The most frequently appearing defective genes are MMP1; associated mainly with MOAs PK, AM and Ho, and NF1; associated with MOAs PK, BCR-ABL, PARP and Ho. MOA:PK occurs most frequently with MMP1, NF1 and PIK3R1 as the most frequent defective genes. The second highest count is for MOA:Ang, which is associated with defective genes RPTOR, SOX9 and MTOR. The next most frequent counts are associated with MOA:HO(MMP1, NF1, PIK3R1, NCOR2 and ABL1), MOA:PARP(NF1, NCOR2, PIK3R1 and ACVR2A), MOA:AM(MMP1 and ZHX2) and MOA:BCR-ABL(NF1 and PIK3R1). Inspection of summarizes these results. For example, MOA:PK has MMP1 (teal) and NF1 (light red) as representing the majority of co-projections, while MOA:BCR-ABL is dominated by NF1 (light red). The common feature of the defective genes associated with SOM meta-clade Group A is their potential to influence the Ras/Raf/MEK/ERK and the PI3K/AKT pathways. The Ras/Raf/MEK/ERK mitogen activated protein kinase (MAPK) cascade is constitutively active and is the dominant pathway driving the production MMP1 [31], the defective gene with the highest SOMDTP node count. MMP1 also modulates cytoskeleton organization, cell motility and additional metastasis signature genes [32] which in turn are mediated by the ERK pathway [33]. In general, the expression of the family of matrix metalloproteinases (MMP) is broadly affected by intracellular signaling via the MAPK family. Targeting the RAF-MEK-ERK mitogen-activated protein kinase cascade is being actively pursued for the treatment of cancer [34]. A direct role of MMP1 on chemosensitivity has not been reported. However, Zhou et al. [35] identify MMP1 as a potential gene conferring resistance of EGFR drugs targeting in non-small cell lung cancer. Rapamycin significantly enhanced the expression of interstitial collagenase (MMP1) at the protein and mRNA levels [36]. An assessment of upregulated expression levels in serous ovarian cancer cell lines by Zhang et al. [37] find matrix metalloproteinase 1 (MMP1) to be among the most upregulated mRNAs in the chemoresistant cell lines. Given that MMP1 is the most frequent defective gene associated with MOA:PK (cf. ), combined with its role in chemosensitivity, suggests that defective MMP1 may play a role in the weak GI50NCI60 responses to PIK3 and EGFR targeting agents screened in the NCI60. NF1 has the 2nd highest node count in group A and has links to the MAPK cascade. For example, loss of NF1 gene expression leads to increased RAS activation and hyperactivation of the downstream RAS effectors, including the RAF/MEK/ERK and the PI3K/AKT pathways [38]. Abnormal activation of RAS by defective NF1 is a central driver event in some soft-tissue sarcomas (MPNST). Receptor tyrosine kinases (RTKs), including PDGFRA and EGFR, can activate RAS signaling and downstream factors such as MEK and mTOR. Ki et al. [39] find the addition of mTOR inhibitors to cell lines harboring defective NF1 enhance the activity of DNA targeting agents. Defective genes that impact PI3K-Akt-mTOR signaling could weaken the tumor cell and enhance susceptibility to chemotherapeutic drugs. A noteworthy entry in S6 master_appendix sheet gp_A_FDA is for Olaparib, MOA:PARP and defective gene NF1. Combination treatment with olaparib and various inhibitors of PD-L1, VEGFR, PI3K, and AKT may effectively inhibit the growth of rapidly proliferating triple negative breast cancer cell lines [40]. A review of candidate synthetic lethality partners to PARP inhibitors in the treatment of ovarian clear cell cancer by Kawahara et al. [41] finds PARP and NF1 to be synthetic lethality pairs [42]. Synthetic lethality (SL) describes the genetic interaction by which the combination of two separately non-lethal mutations results in lethality [43]. Generally, the ablation of two genes located in parallel pathways (leading to cell survival or a common essential product) is one of the important patterns causing synthetic lethality. Synthetic lethality appears to be achieved with combined EGFR and PARP inhibition [44]. SL has recently emerged as a promising new approach to cancer therapy [45]. MOA:Ang ranks 2nd among the MOA’s listed for group A and is associated with defective RPTOR, SOX9 and MTOR. Oncogenic activation of the phosphatidylinositol-3-kinase (PI3K), and mammalian target of rapamycin (MTOR) facilitates tumor formation, disease progression, therapeutic resistance, and the sensitivity of prostate cancer cell lines to PI3K-AKT-mTOR-targeted therapies [46]. SOX9 is reported to promote of tumor growth, proliferation, migration and invasion and the metastasis and regulation of Wnt/β-catenin signalling [47]. Inhibition of SOX9 expression in led to a significant reduction in primary tumor growth, angiogenesis, and metastasis [48]. The full extent of the PI3K-AKT-mTOR signaling network during tumor/angiogenesis, invasive progression and disease recurrence remains to be determined. The existing results link chemosensitivity of MOA:Ang agents to a selective set of defective genes.

Results: Group B(meta-clades 7 through 9)

Eleven MOA classes (PK, Ho, Db, NonCan, Ds, Apo, AM, T2, A7, HDAC, and PARP) are associated with eight defective genes (ZHX2, MECOM, MMP1, EP300, MTOR, BMP7, CYP11B1 and CYP11B2) for SOM meta-clades 7 through 9. ZHX2 has the most and EP300 the least significant contingency scores (. ZHX2 projects to the central region of group A, while CYP11B2, MECOM and MMP1 project to the perimeter regions (. and D indicate that defective genes ZHX2, MECOM, MMP1 and PTK2 and MOAs PK, Ho and Db occupy the most SOMDTP nodes. These defective genes are associated with the GO ontology pathway Regulation_of_Response_to_Stress, with a potential to influence cellular functions such as differentiation and translation. ZHX2 is a member of the zinc fingers and homeoboxes gene family that acts as a transcriptional repressor. Ontology (GO) annotations related to ZHX2 also include DNA-binding_transcription_factor_activity. MECOM (MDS and EVI1 complex locus protein), with the 2nd highest SOMDTP counts, is found to be commonly enriched in cancer cell lines. Makondi et al. [49] suggest that targeting the MAPK signal transduction pathway through the targeting of the MECOM might increase tumor responsiveness to irinotecan treatment. Saito et al. [50] notes that EVI1 alters metabolic programming associated with leukemogenesis and increases sensitivity to L-asparaginase. The 3rd most frequent gene, MMP1, is in the family of matrix metalloproteinases that are involved in the breakdown of extracellular matrix and contribute to metastasis, as noted above. S7 master_appendix sheet gp_B (meta-clades 7 through 9) lists the defective genes with a significant contingency score (p<0.1) for each meta-clade in Group B. Row entries in S14 master_appendix sheet gp_B_FDA list the counts for each FDA and MOA entry for significant defective genes.

Results for group B(meta-clades 7 through 9).

The SOMDTP region displayed in Panel B represents the boundary for meta-clades 7 through 9 (see the white border in ). S7 master_appendix sheet gp_B lists the table in Panel C. See the legend of for details. S14 master_appendix sheet gp_B_FDA lists the FDA compounds associated with these defective genes.

Results: Group C (meta-clades 10 through 15)

Contingency scores order the defective genes as: NOTCH1, RBPJ, IGF1R, PIK3CG, CDKN2A, ATM, NRAS, MSH2, CDKN2B, CDC25A, NCOR2, RPTOR, STAT2, EIF5A2, MYC, SPEN and MTOR (). IGF1R projects mainly at the perimeter of SOMDTP for group C, while the remaining defective genes project to more central regions (). Seventeen MOA classes, ordered from most to least node counts, are Ds, PK, HDAC, Apo, Ho, AM, BCR-ABL, A7, NFkB, BRD, Mito, NonCan, PARP, KLF4, PSM, T1 and SMO ( and D).

Results for group C(meta-clades 10 through 15).

The SOMDTP region displayed in Panel B represents the boundary for meta-clades 10 through 15 (see the white border in ). S8 master_appendix sheet gp_C lists the table in Panel C. See the legend of for details. S15 master_appendix sheet gp_C_FDA lists the FDA compounds associated with these defective genes. The ten most frequent defective genes, IGF1R, CDC25A, NOTCH1, NCOR2, RPTOR, CDKN2A, MSH2, NRAS are associated with MOAs Ds, PK, HDAC, Apo, Ho, AM and BCR-ABL. The salient feature of these defective genes is their role in arresting the cell cycle. Cellular processes involving phosphorylation function to interrupt the cell-cycle, particularly from members of the family of tyrosine kinases. For example insulin-like growth factor 1 receptor (IGF1R) belongs to the large family of tyrosine kinase receptors and is activated by a hormone called insulin-like growth factor 1 (IGF-1) and by a related hormone called IGF-2 [51]. SOMDTP nodes in meta-clades 10 through 15 that are associated with defective IGF1R exist for chemosensitivity mainly to leukemia cell lines. IGF1R is often overexpressed by tumors and mediates proliferation and apoptosis protection [52,53]. As noted earlier [27], evaluation of drug sensitivity for compounds targeting leukemia cell lines has prompted the emergence of IGF1R as a potential therapeutic target for the treatment of leukemia. Weisberg, et al. [54] report that IGF1R protein expression/activity was substantially increased in mutant RAS-expressing cell lines, and suppression of RAS led to decreases in IGF1R. Synergy between MEK and IGF1R inhibitors correlated with induction of apoptosis, inhibition of cell cycle progression, and decreased phospho-S6 and phospho-4E-BP1. They suggested that given the complexity of RAS signaling, it is likely that combinations of targeted agents will be more effective than single agents, inclusive of IGF1R inhibitors. CDC25A, with the 2nd highest node counts, affects cell proliferation and its expression is thought to be controlled through the PI3K-AKT-MTOR signaling pathway [55]. Sadeghi et al. [56] suggest that CDC25A controls the cell proliferation and tumorigenesis by a change in expression of proteins involved in cyclin D1 regulation and G1/S transition. The finding that defective CDC25A is associated with MOA:PK is consistent with the appearance of pazopanib and axitinib in the FDA compounds listed in S15 master_appendix sheet gp_C_FDA. The evolutionarily conserved NOTCH family of receptors regulates a myriad of fundamental cellular processes including development, tissue patterning, cell-fate determination, proliferation, differentiation and cell death [57]. The crosstalk among Notch1 (3rd highest node counts) and other prominent molecules/signaling pathways includes DNA damage repair(DDR) [58]. DDR is a complex protein kinase based signaling pathway which is conducted by the members of the phosphoinositide 3-kinase-like kinase (PIKK) family, such as ataxia telangiectasia mutated (ATM). NOTCH1 is a major oncogenic driver in T cell acute lymphoblastic leukemia [59]. NOTCH1 siRNA can effectively inhibit the expression of NOTCH1 gene, inhibit the proliferation of lung cancer A549 cell lines and increase the sensitivity to chemotherapeutic drugs [60]. Of specific interest is the intersection of defective NOTCH1 and the projection for imatinib (S15 master_appendix sheet gp_C_FDA). Aljedai et al. [61] explored the role of NOTCH1 signaling in chronic myeloid leukemia cell lines to find cross-talk between NOTCH1 and BCR-ABL. Their results revealed that imatinib induced BCR-ABL inhibition results in upregulation of NOTCH1 activity. In contrast, inhibition of NOTCH1 leads to hyperactivation of BCR-ABL. They proposed that the antagonistic relationship between NOTCH1 and BCR-ABL in CML suggests a combined inhibition of NOTCH1 and BCR-ABL may provide superior clinical response over tyrosine-kinase inhibitor monotherapy. CDKN2A, MSH2 and ATM (with the next most frequent node counts) have roles in cell cycle arrest. CDKN2A is capable of inducing cell cycle arrest in the G1 and G2 phases. Gene Ontology (GO) annotations related to CDKN2A include transcription factor binding. MSH2 and ATM are components of the post-replicative DNA mismatch repair system (MMR), whereby activation of checkpoint arrest and homologous DNA repair are necessary for maintenance of genomic integrity during DNA replication [62]. Germ-line mutations of the ataxia telangiectasia mutated (ATM) gene result in the well-characterized ataxia telangiectasia syndrome, which manifests with an increased cancer predisposition. Somatic ATM mutations or deletions are commonly found in lymphoid malignancies. Such mutations may be exploited by existing or emerging targeted therapies that produce synthetic lethal states. Cancers with mutations in genes encoding proteins involved in DNA repair may be more sensitive to treatments that induce synthetic lethality by inducing DNA damage or inhibiting complementary DNA repair mechanisms.

Results: Group D(meta-clades 16 through 18)

Contingency scores order the defective genes as: MAP2K3, PTK2, BRAF, CYP11B1, CYP11B2, MMP9, MYC, FLT1 and RBPJL (). MYC, BRAF and FLT1 project to mainly non-overlapping SOMDTP regions (). Eight MOA classes, ordered from most to least node counts, are Tu, HSP90, NonCan, PSM, DB, T1, PK, T2 and Pase ( and D). MOA:Tu dominates these results, while MOA:HSP90, MOA:NonCan and MOA:PSM appear with the next highest node counts. The most frequent defective genes include MYC, RBLJL, FLT1 and MMP9. S16 master_appendix sheet gp_D_FDA lists the FDA compounds associated with these defective genes.

Results for group D(meta-clades 16 through 18).

The SOMDTP region displayed in Panel B represents the boundary for meta-clades 16 through 18 (see the white border in ). S9 master_appendix sheet gp_D lists the table in Panel C. See legend of for additional details. S16 master_appendix sheet gp_D_FDA lists the FDA compounds associated with these defective genes. Most of the defective genes in group D are involved with the mitotic component of tumor cell proliferation. For example, MYC encodes a nuclear phosphoprotein that has been implicated in the regulation of cell proliferation and the development of human tumors [63] and is regarded as a major determinant of mitotic cell fate [64]. Inhibition of microtubule polymerization has been reported to block mitosis and induce cell death [65]. Conacci-Sorell et al. [66] report the expression of MYC results in the induction of the actin-bundling protein fascin, formation of filopodia, and plays a role in cell survival, autophagy, and motility. MYC also recruits acetyltransferases that modify cytoplasmic proteins, including α-tubulin. Marzo-Mas et al. [67] find the antiproliferative activity of colchicine to inhibit tubulin polymerization to be modulated by the downregulation of c-MYC expression. Alexandrova et al. [68] report that the N-terminal domain of c-MYC associates with alpha-tubulin and microtubules. Marzo-Mas et al. [67] also found that tubulin binding compounds were able to downregulate the expression of the VEGF, hTERT and c-MYC genes. Others [69] have proposed targeting oncogenic MYC as a strategy for cancer treatment, proposing the destruction of a microtubule-bound MYC reservoir during mitosis contributes to vincristine´s anti-cancer activity [70]. Collectively these results support a role of defective MYC in chemosensitivity to tubulin targeting agents. The 2nd most frequent defective gene is RBPJL. RBPJL binds to DNA sequences almost identical to that bound by the Notch receptor signaling pathway transcription factor recombining binding protein J (RBP-J). A related family member RITA (RBPJ Interacting And Tubulin Associated 1) also acts as a negative regulator of the Notch signaling pathway that induces apoptosis and cell cycle arrest in human hepatocellular carcinoma [71]. Structural and biophysical studies demonstrate that RITA binds RBP-J and biochemical and cellular assays suggest that RITA interacts with additional regions on RBP-J [72]. Emerging evidence reveals Notch as a microtubule dynamics regulator and that activation of Notch signaling results in increased microtubule stability [73]. The RBPJL/RITA association raises the possibility that RITA-mediated regulation of Notch signaling may be influenced by RBPJL and potentially play a role in the chemosensitivity of Tu agents. The 3rd most frequent defective gene is FLT1 (Fms-related tyrosine kinase (FLT) or VEGF receptor 1). The role of FLT1 in the chemosensitivity of tubulin agents would appear to be unexpected. However, the blockade of VEGFR-1 and VEGFR-2 enhances paclitaxel sensitivity in gastric cancer cell lines [74]. Microtubule-targeted drugs inhibit VEGF Receptor-2 expression by both transcriptional and post-transcriptional mechanisms [75]. Novel anti-mitotics, which target the mitotic spindle through interactions with non-microtubule mitotic mediators like mitotic kinases and kinesins, have been identified and are now in clinical testing [76]. Included in clinical testing are compounds that have low nanomolar potency against ABL, FLT1 and PDGFR [77]. Tumor endothelial cell lines demonstrate a strong activation of VEGF and Notch signaling [78]. VEGF-B is a growth factor that binds FLT1 and is considered the odd member of the VEGF family, with mainly angiogenic and lymphangiogenic activities. VEGF-B has protective effects on neuropathy [79]. FLT1 has been proposed as a prognostic indicator in endometrial carcinoma [80]. The 4th most frequent defective gene, MMP9 (matrix metalloproteinases 9) and its associated vascular endothelial growth factor (VEGF) are critical for tumor vascularization and invasion. A recent study of the expression of MMP-9 and VEGF(FLT1) in breast cancer patients found their correlation significant enough to propose these genes as prognostic indicators [81]. Inspection of these SOM meta-clades finds MOA:PK agents to be located mainly in the upper portion of SOM meta-clades 16 through 18, where defective genes MMP9 and RBPJL also appear. Crizitonib is co-projected to these SOM nodes. Cizitonib is a small molecule TKI that inhibits the activity of the ALK fusion proteins, MET, ROS1, and MST1R (RON) [82,83]. Noteworthy is the impressive clinical responses to crizotinib and other small-molecule drugs inhibiting the ALK tyrosine kinase [84]. Defective MMP9 or RBPLJ may contribute to enhanced crizitonib chemosensitivity. MOA:HSP90 is the 2nd most frequent MOA class in SOM meta-clades 16 through 18. Several studies have suggested a possible connection between HSP90 and the microtubule cytoskeleton. Weis et al. [85] find that HSP90 protects tubulin against thermal denaturation. Anti-tumor selectivity of a novel Tubulin and HSP90 dual-targeting inhibitor has been identified in non-small cell lung cancer model [86]. The presence of geldanamycin within the list of agents in this SOM region is consistent with this observation. Liu et al. ([87]) find evidence that mis-regulated HSP90 can affect drug sensitivity, an effect proposed to be due the altered regulation of HSP90 client proteins, inclusive of tubulin.

Group E(meta-clades 19 through 20)

Contingency scores order the defective genes as: BRAF, EGFR, JAK3, RPTOR, PIK3CG and SPTBN1 (). BRAF dominates the central region of SOMDTP for group E (). Nine MOA classes, ordered from most to least node counts, are Pk, BCR-ABL, HDAC, NonCan, PSM, A7, Ds, HSP90 and Ho ( and D). The most frequent defective gene is BRAF and is associated with the most frequent MOA:PK, followed by MOA:BCR-ABL, MOA_HDAC. PIK3CG, RPTOR and JAK3 are the 2nd, 3rd and 4th ranking defective genes. This SOMDTP region corresponds to the projection of known FDA approved BRAF targeting agents; dabrafenib, hypomethicin, selmutinub and vemurafenib (S17 master_appendix sheet gp E_FDA). These results are consistent with the findings of Ikediobi et al. [4].

Results for group E(meta-clades 19 through 20).

The SOMDTP region displayed in Panel B represents the boundary for meta-clades 19 through 20 (see the white border in ). S10 master_appendix sheet gp_E lists the table in Panel C. See legend to for additional details. S17 master_appendix sheet gp_E_FDA lists the FDA compounds associated with these defective genes. The association of defective BRAF with compounds that target this condition are well documented [4,88]. Mutant BRAF (v-Raf murine sarcoma viral oncogene homolog B1) inhibitors such as vemurafenib and dabrafenib have achieved unprecedented clinical responses in the treatment of melanomas [89,90]. The association of defective BRAF to MOA:HDAC is consistent with literature reports. Recent studies have shown that histone deacetylase (HDAC) and mutant BRAF (v-Raf murine sarcoma viral oncogene homolog B1) inhibitors synergistically kill melanoma cell lines with activating mutations in BRAF by induction of necrosis [91]. A role for defective PIK3CG is indicated in SOM meta clades 19 through 20 for MOA:PK and MOA:BCR-ABL. The publications from Shi et al. [92], Van Allen et al. [93] and Rizos et al. [94] addressed the roles of PI3K pathway gene’s mutations. Resistance to BRAF inhibitors can be associated with upregulation of the PI3K/AKT pathway, resulting from AKT1/3 mutations and mutations in positive (PIK3CA, PIK3CG) and negative (PIK3R2, PTEN and PHLPP1) regulatory genes [95]. The results in and D indicate a role for HDAC in BRAF chemosensitivity. Gallagher et al. [96] find that HDAC inhibitors affect BRAF-inhibitor sensitivity by altering PI3K activity. A role for defective RPTOR is indicated for MOA:BCR-ABL. Drugs simultaneously targeting two or more pathways essential for cancer growth could slow or prevent the development of resistant clones. Puausova et al. [97] identify dual inhibitors of proliferative pathways in human melanoma cell lines bearing the V600E activating mutation of BRAF kinase. They found these inhibitors to simultaneously disrupt the BRAF V600E-driven extracellular signal-regulated kinase (ERK) mitogen-activated protein kinase (MAPK) activity and the mechanistic target of rapamycin complex 1 (mTORC1) signaling in melanoma cell lines, yielding dynamic changes in mTOR(RPTOR) signaling. The non-receptor tyrosine Janus kinases (JAK) are involved in various processes such as cell growth, development, or differentiation. The result presented here finds an enhanced chemosensitivity to HDAC inhibitors for tumor cell lines with defective JAK3. DUAL kinase and HDAC inhibitors have been developed based on the reasoning that specifically blocking more than one oncogenic pathway simultaneously with a combination of different drugs may be a more effective cancer treatment [98]. Dual inhibitors of Janus kinases and HDAC have been developed [99,100], As an example, Dymock’s group has designed and synthesized a novel series of dual JAK and HDAC inhibitors based on the core features of ruxolitinib [101]. Upregulation of JAK3 has been observed in response to increases of oxygen-containing species following HDAC inhibition [102]. Although the design of dual JAK/HDAC inhibitors was based on simultaneously targeting different oncogenic pathways, a role for defective JAK in chemosensitivity may be important.

Results: Group F(meta-clades 21 through 24)

Contingency scores order the defective genes as: AXL, STAT2, CDKN2A, RPTOR, KRAS and MYC(). CDKN2A is the dominant defective gene in meta-clade 21 while RPTOR and STAT2 are located primarily in meta-clade 22 (). Eighteen MOA classes exist, with MOAs appearing with the highest counts all targeting DNA (Ds, T2, A7, Db, Df and A2)( and D). CDKN2A as the most frequent defective gene, followed by AXL, MYC, STAT2 and KRAS. Meta-clades 21 through 24 represent, by far, the largest number of FDA approved agents. These defective genes affect proliferation largely resulting from their role in targeting DNA, DNA damage repair and phosphorylation. These genes do not overlap with a prior analysis of DNA repair genes in the NCI60 and their predictive value for anticancer drug activity [103].

Results for group F(meta-clades 21 through 24).

The SOMDTP region displayed in Panel B represents the boundary for meta-clades 21 through 24 (see the white border in ). S11 master_appendix sheet gp_F lists the table in Panel C. See legend to for additional details. S18 master_appendix sheet gp_F_FDA lists the FDA compounds associated with these defective genes. Su et al. [104] report that CDKN2A loss is significantly associated with the sensitivity of CDK4/6 inhibitors (also projected to SOM meta-clade 14). Evidence supports the role of CDKN2A in cell cycle independent functions such as DNA damage repair [105]. CDKN2A also provides instructions for making several proteins, including p16(INK4A) and p14(ARF), which function as tumor suppressors that keep cell lines from growing and dividing too rapidly or in an uncontrolled way. Overexpression of CDKN2A inhibits cell proliferation and invasion, to cause cell cycle arrest in the G1 phase. CDKN2A mediates the AKT–mTOR (RPTOR) signaling pathway by suppressing lactate dehydrogenase (LDHA) [106]. Taken together, these results suggest therapeutic agents that target CDKN2A and RPTOR in cancers that share these defective genes. Consistent with chemosensitivity for FDA compounds in these meta-clades, recent observations report that long term survivorship after high dose DNA damaging chemotherapy with melphalan is compatible with an increased chemosensitivity due to impairment of the DNA repair pathway [107]. Loss of CDK2 presents a different challenge to cell lines, aside from the more conventional role to regulate cyclins, which in turn might lead to altered DNA damage response and checkpoint activation, mutations in DNA repair genes drive cancer development [108,109]. Ras proteins play a crucial role as a central component of the cellular networks controlling a variety of signaling pathways that regulate growth, proliferation, survival, differentiation, adhesion, cytoskeletal rearrangements and motility of a cell [110]. KRAS (Kirsten-rat sarcoma viral oncogene homolog) is a prominent oncogene that has been proven to drive tumorigenesis, modulate numerous genetic regulatory mechanisms including the induction of DNA damage repair pathways [111,112]. Mutant RAS-driven tumorigenesis arises independently of wild-type RAS isoforms, but recent evidence indicates wild-type isoforms are involved. Grabocka and colleagues [113] report how the loss of wild-type RAS alters oncogenic signaling and dampens the DNA-damage response, thereby affecting tumor progression and chemosensitivity. Since the MOA agents listed for SOM meta-clades 21 through 24 have roles in DNA damage, defective CDKN2A, RPTOR and KRAS may contribute to chemosensitivity of tumor cell lines to these agents. While targeting defective KRAS remains elusive [114], small molecule inhibitors are in the pipeline [115]. Exploration of NCI60 screened compounds that project to meta-clades 21 through 24 may provide a starting point for lead discovery. Pyrazoloacridine, palbociclib, methotrexate, fluorouracil, 8-Chloro-adenosine, pralatrexate, pemetrexed, pelitrexol, by-product_of_CUDC-305, 6-Mercaptopurine and oxaliplatin appear most frequently in the SOM region for group F (S18 master_appendix sheet gp_F_FDA). A study of gastric cancer patients detected a high frequency of mutations in MLL4, ERBB3, FBXW7, MLL3, mtor(RPTOR), NOTCH1, PIK3CA, KRAS, ERBB4 and EGFR [116]. KRAS mutations have been reported as predictors of the response of lung adenocarcinoma patients receiving platinum-based chemotherapy [117,118]. NOTCH1 mutations target KRAS mutant CD8+ cells to contribute to their leukemogenic transformation [119,120]. Notable in the list of FDA approved agents associated with SOM meta-clades 21 through 24 is oxaliplatin. Oxaliplatin-based chemotherapy is more beneficial in KRAS mutant than in KRAS wild-type metastatic colorectal cancer [121]. SOM meta-clade 21 is the location of cytarabine (ara-C) and is consistent with the conclusion of Ahmad et al [24] that adult AML patients carrying defective KRAS benefit from higher ara-C doses more than wt KRAS patients. Enhanced chemosensitivity of tumor cell lines with defective KRAS may represent a link to these observations.

Results: Group G(meta-clades 25 through 28)

Contingency scores order the defective genes as PEG3, ABL1 and PIK3CG as the most significant and MTOR, KRAS and NRAS as the genes with the least significance (). PIK3CG represents the largest count of SOMDTP projections, located mainly in the central region of meta-clades 25 through 28. MTOR, KRAS and NRAS are located mostly in the bottom of this region (). Twelve MOA classes exist, with MOAs appearing with the highest counts as Ds, Apo, and PK ( and D). The most frequently occurring defective genes are PIK3CG, NRAS, PTK2 and ABL1 ( and D). The defective genes in meta-clade 25 through 28 represent an amalgamation of many of the previous meta-clade groups, where sets of defective genes were involved in cellular processes of phosphorylation and progression through the cell cycle for proliferation. Consequently, many of these defective genes have been previously discussed, with the exceptions of NRAS and PTK2.

Results for group G(meta-clades 25 through 28).

The SOMDTP region displayed in Panel B represents the boundary for meta-clades 25 through 28 (see the white border in ). S12 master_appendix sheet gp_G lists the table in Panel C. See legend to for additional details. S19 master_appendix sheet gp_G_FDA lists the FDA compounds associated with these defective genes. NRAS (ranked 2nd by node counts) is one of the most common targets of oncogenic signaling mutations in hematologic malignancies. Even with the challenge of directly targeting mutant RAS oncoproteins, mitogen-activated protein kinase (MAPK) inhibition has been shown to reduce leukocytosis by targeting the downstream pathway of NRAS [122]. As noted earlier, combinations of targeted agents may not supersede conventional cytotoxic regimens, however combinations may enhance treatment efficacy. Identifying compounds that target defective NRAS and other compounds that target defective PTK2, ABL1 or PIK3CG, in the case of meta-clades 25 through 28, may offer effective combination therapies. Without doubt, large numbers of molecular pathways are likely to be synergistically involved in cancer biology, the contribution of each pathway may be different and identifying which combinations to select will be experimentally exhausting. Bioinformatic approached as discussed herein may offer useful clues. PTK2 (ranked 3rd by node counts) is a non-receptor protein-tyrosine kinase with functions that include cell migration, reorganization of the actin cytoskeleton, cell cycle progression, cell proliferation and apoptosis through kinase-dependent and -independent mechanisms [123]. It is a member of the FAK (focal adhesion kinase) subfamily of protein tyrosine kinases and is listed as a transcriptional regulator. FAKs are reported to modulate chemosensitivity by altering chemokine production [31]. Enhanced chemosensitivity to gemcitabine has been reported with interference of FAKs [124]. Because of the involvement of PTK2(FAK) in many cancers, drugs that inhibit FAK are being sought and evaluated [125]. A screen to identify mechanisms of bleomycin resistance identified Sky1, PTK2 and Agp2 as determinants of chemosensitivity [126].

Discussion

The development of rational strategies for targeted cancer therapy will require integrative analysis of data derived from diverse sources including, but not exclusive to, large-scale, publicly available, pre-clinical and clinical small-molecule screening and genomic data. A widely accepted challenge of linking screening and genomic data is how to gain molecular insight into the MOA(s) of active compounds. Not unexpectedly, the range of potentially important links is enormous; yielding massive challenges to the development of statistical/computational bioinformatic tools that assist integrative analyses. Advances have been made by focusing studies on fewer compounds (24 compounds in the CCLE [127] or approved FDA compounds [128]) or by studying small numbers of driver or mutated genes [129]. The results of the present study demonstrate the power of combining genomic data and small-molecule screens of FDA compounds in the NCI60 to provide mechanistic clues about compound activity. These results reveal coarse-grained associations between chemosensitivity of target-directed FDA agents towards tumor cell lines harboring specific genetic defects. SOM clustering finds seven regions of GI50NCI60 responses, broadly assigned to FDA MOA classes that target, not exclusively, tubulin, BRAF mutations, RAF/MEK/ERK/mTOR and the PI3K/AKT pathways, DNA or protein synthesis pathways, the cell cycle and are associated with a relatively unique set of defective genes for each MOA class. Salient associations include the role of defective MYC for tubulin targeting agents, defective CDKN2A, NRAS and KRAS for DNA damaging/targeting agents and the role of defective NOTCH1 for mutant BRAF targeting agents. Remarkably, nearly half of the defective genes reported herein also appear in Ikediobi et al. [4], albeit using very different methods. The results described here may be applied to future pre-clinical studies. Notably there are exploitable instances of enhanced chemosensitivity of compound MOA’s for a few defective genes. Specifically, there is support for synthetically lethal defective genes as contributing to chemosensitivity. Defective genes exist withing the NCI60 as doublets, triplets, quartets, etc., and a subset of these genes are associated with tumor cell lines that exhibit chemosensitivity. Exploiting chemosensitive SOMDTP nodes associated with tumor cell lines having more than one defective gene, that are also associated with numerous screened compounds, may identify additional synthetic lethal strategies. The notion of targeting parallel pathways can be extend beyond synthetically lethal genes. Combining agents with enhanced chemosensitivity against one defective gene, and its related cellular pathways, with other agents showing enhanced chemosensitivity towards other defective genes in alternative pathways, may enhance the efficacy of each agent. For example, each SOMDTP node with significant chemosensitivity for one defective gene includes many NSCs with similar GI50NCI60 responses, inclusive of FDA compounds. Combinations of NSCs from SOMDTP nodes also exhibiting differential chemosensitivity for one or more defective gene in parallel pathways may be considered for experimental testing. The goal would be to identify combinations of NSCs that separately target parallel cellular pathways to determine whether their combination would enhance individual efficacies. The bioinformatic analysis described herein may provide clues for experimental pre-clinical testing of possible drug combinations. Important caveats underly the interpretation of the results presented here. First, links of defective genes to chemosensitivity are not revealed in a clear-cut manner. Rarely is chemosensitivity associated only to tumor cell lines harboring defective genes. Chemosensitivity also exists within tumor cell lines lacking defective genes (cf. ). Consequently, while gene-drug associations may provide a genetic basis for drug selection [130], there is clear evidence herein that additional, not well understood, factors are in play. Second, combinations of defective genes appear to play a role in chemosensitivity. For example, 44 defective genes are listed in the tables provided in each of the RESULTS subsections. Eighty-eight percent of these genes are listed only once (N = 24) or twice (N = 14) across the seven meta-clade groups. This result is an indication that relatively few defective genes contribute to enhanced chemosensitivity across meta-clade groups. In contrast, only two (RPTOR and MTOR) and three genes (PIK3CG, MYC and CDC25A) appear jointly in four or three of the seven meta-clade groups, respectively. Consequently, identifying a single defective gene as responsible for chemosensitivity may be rare; while combinations with genes commonly labeled as cancer genes may be more likely. Third, the 44 defective genes listed in the RESULTS subsections can be compared to current compendia of cancer gene mutations derived from human studies. The Cancer Genome Interpreter [131] has been developed to classify protein-coding somatic mutations and copy number variants into predicted passenger or known/predicted oncogenic mutations. Half (N = 22) of the defective genes listed here are identified by the Cancer Genome Interpreter’s encyclopedia of patient-derived tumor xenografts (PDX) as driver mutations. A recent report using driver mutation patterns for prioritization of personalized cancer therapy [132] finds nearly 20% of their 39 tumor biomarkers to be included in this set of defective genes. Although the defective genes listed here were derived from novel applications of bioinformatic tools, these results find support within other databases. The absence of overlapping genes suggests potentially important roles for non-driver genes in chemosensitivity. Fourth, global analysis of modest to large scale genomic and screening data offers only one perspective. The genetic make-up of the NCI60 represents only a snapshot of data for a small number of tumor cell lines. The universal application of results derived from the NCI60 may be relevant only in the rare instance that another tumor cell matches the genetic makeup of any NCI60 cell. This does not, however, rule out analyses, parallel to that presented here, that jointly examine existing and new data. Fourth, the absence of defective TP53 in these results has not gone undetected. Most NCI60 tumor cell lines harbor defective TP53. As a result, establishing a statistically significant Student’s t-test for selective chemosensitivity fails mainly due to too few responses of tumor cell lines lacking defective TP53. Extending the data analysis to more tumor cell lines, lacking mutant TP53, may prove helpful. While addressing each of these caveats is massively challenging, resolution of each issue contributes to the understanding of preclinical screening results derived from a small set of human tumor cell lines. In summary, the challenge of finding meaningful results within complex and noisy data has been proposed using contemporary data and state-ot-the-art statistical tools. This global analysis of multiple datasets, overlapping in their origins within the NCI60, has provided a unique perspective for associations of chemosensitivity, defective genes and MOAs. (TIF) Click here for additional data file. (TIF) Click here for additional data file. (TIF) Click here for additional data file. (TIF) Click here for additional data file.

Manuscript_plos_revised_figs.

(DOCX) Click here for additional data file.

Manuscript_appendix_DTP.

(DOCX) Click here for additional data file. 13 Jan 2021 PONE-D-20-35751 Bioinformatic Analysis Linking Genomic Defects to Chemosensitivity and Mechanism of Action PLOS ONE Dear Dr. Covell, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Feb 27 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Rama Krishna Kancha Academic Editor PLOS ONE Journal Requirements: Additional Editor Comments: The reviewers are of the opinion that the methods section is not described enough for the study to be replicated. A detailed introduction and methods section is warranted. Further, all the changes in figures as well as in the results and discussion sections as suggested by the reviewers need to be included. The main findings of the study has to be highlighted clearly. Comments from the Reviewer 1 (in case the author can't access the attached file): OVERALL This manuscript combines three data sources: drug response data on the NCI60 cell lines which is clustered using self-organizing maps, gene alteration data on the NCI60 cell lines, and annotations of the mechanism of action (MOA). The primary goal is to relate disrupted or defective genes to mechanisms of action using the clustered drug compounds as an intermediary. The methods do not appear to be particularly novel (various clustering techniques, t-tests, and Fisher's Exact Tests) and many of the identified genes are already well-known as drivers of cancer and as targets of specific therapies. Nevertheless, there is utility in terms of the scope of the analysis. Unfortunately, the methods are not described in adequate detail, and there appears to have been no adjustment to account for multiple testing. Thus, in its present form, it is not possible to fully tell if the results are actually correct. Moreover, the results section appears to contain a substantial amount of discussion. Finally, the results-discussion appear overly long, giving the impression of a "data dump" without adequate attention having been paid to highlight the most significant aspects of the findings. MAJOR COMMENTS [1] A fundamental problem with this manuscript is that it would be completely impossible for anyone to replicate the results. The author never explains what the DTP is. This reviewer happens to know that it is the Department of Therapeutic Programs at the NIH. Going to their web site and guessing that (1) I need to follow the “bulk downloads” link and (2) I need to follow the "NCI60 Growth Inhibition Data" link gets me to a page with another decision. Which release of this data was used? October 2020? June 2020? Something else? Downloading the October 2020 release, one learns a couple of things. First, it contains data on about 55K compounds, more than the 53K mentioned in the manuscript. Second, it contains data on 159 cell lines, not just the 59 that are part of the NCI60. Leaving the reader in the dark about exactly which data were used (even if it is publicly available) is unacceptable. [2] In general, the methods are also described too vaguely and with too little detail to allow a reader to replicate the results. [3] Was the CV computation used for filtering compounds based on the negative log-transformed GI50 data, or was it computed on some other scale? [4] Please describe in more detail the heuristic that was used to decide on the size of the SOM grid, or provide a complete reference. [5] The text twice mentions a manuscript by "Holbeck et al." The first time, it does not link to a reference in the bibliography. The second time, it links to reference #19, by Whyte and Holbeck. Please fix this. [6] I don't understand the methods (described on page 10 of the document for review, which contains lots of introductory pages before the manuscript actually starts) associated with Box D. Since the nodes in the SOM map contain drugs, how are we able to label some of those nodes based on the mutations of tumor cell lines? I _think_ the point is that the "codebook" for each node consists of an average (over the compounds assigned to the node) expression level for each cell line, which is followed by a t-test comparing the expression between cell lines with and without an abnormality. This step seems to be followed by a permutation test to compute empirical p-values. But this means that they have performed more than 4.5 million statistical tests, without making any kind of adjustment for multiple testing. Why should I believe that a cutoff of "p < 0.05" produces anything but random noise in this context? In fact, he may have found *fewer* significant nodes than expected by chance (when you consider that you are performing 368 tests, each with a 5% chance of being picked randomly, per node). [7] Why was the association between clusters and defective genes performed at the level of 1232 SOM nodes (clades) and not at the level of 28 meta-clades or 7 meta-clade groupings? [8] The caption for Figure 1 needs to explain/define the numerous abbreviations and acronyms employed. One also suspects that the duplication of "T1" in Box B should actually read "T1, T2". [9] Figure 2 should include a colorbar to make it easier for readers to interpret the SOM plot. It would help of the author prevented the plot title from overlapping the list of compounds. It would also help to label the subpanels (with something like A, B, C) and to make it clearer why there are two histograms and how they differ. This is especially true since the figure caption does not discuss the histograms. Also, both histograms should use "inter-node" and not the spurious plural. [10] None of the graphical methods in Figure 3 appear to contain a compelling reason to select the numbers of clusters ("meta-clades") proposed by the author. Nor do they justify the collapse of those categories into seven "meta-clade groupings". [11] Using a gradient-based color scheme to label discrete meta-clades and meta-clade groupings in Figure 4 is potentially highly misleading. For one thing, it is extremely difficult to tell if the color boundaries actually align with valid branches in the accompanying dendrogram, especially for the gray-scale meta-clade groupings. For visualization, the author might want to consider the Polychrome R package, which can produce a wider variety of distinct colors for displaying discrete classes. [12] More importantly, he might want to consider other clustering methods (K-means? PAM?) in light of the fact that the hierarchically defined meta-clades frequently break up into distinct subregions of the SOM display, indicating an incompatibility of clusters between the two methods. As sort-of noted by the author, this disparity is probably caused by changing distance metrics. Both hierarchical clustering and PAM can use the exact same distance metric that was applied when performing SOM clustering. And it would be important to use the same distance metric when applying methods to estimate the number of clusters. Visualizing the data with alternative methods such as t-SNE or UMAP might also provide additional insight into whether one should believe the clustering results. [13] it would be useful to see a version of the original SOM plot (from Figure 2) with the meta-clade boundaries (from Figure 4) superimposed. After all, the original plot indicates the relative distance between neighboring SOM nodes, so it would provide greater support for the clustering if the boundaries tended to run through the yellow areas where neighbors aren't particularly close. [14] From the figure and its caption, it is not clear what we are supposed to learn from Figure 5. My immediate reaction is that ABL1 is not particularly associated with GI50 values in this SOM node. What I get from this plot is that perhaps one should be using a Wilcoxon rank sum test instead of a t-test to associate defective genes with SOM nodes. I have a similar reaction to Figure 6. [15] Figure 7 appears to be somewhat misleading by over-representing the amount of data associated with the CellMiner MOA annotations. While the SOM clusters were created from almost 47000 compounds, the MOA data is only available for 104 compounds, which is only about 0.22% of the total compounds considered. Each of those 104 compounds is then apparently used up to eight times. Is there any statistical significance to any of these "interpretive" assignments? [16] Frankly, I am unclear on exactly how many times Fisher's Exact test was used to associate MOA's with defective genes. Among other things, I have completely lost track of whether these counts are based on SOM nodes, meta-clades, or meta-clade groups. And, once again, there is no sign that any corrections to p-values have been made to account for multiple testing. [17] Beginning on page 17 of the document for reviewers, the author drifts from presenting results to discussing their interpretation. It would be easier for the reader if these parts of the manuscript remained more clearly separated. [18] It would be nice if the font sizes in the numerous tables were large enough to be readable. Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating in your Funding Statement: "This project was funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Contracts No. HHSN261200800001E and HHSN261201700007I." Please provide an amended statement that declares *all* the funding or sources of support (whether external or internal to your organization) received during this study, as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now. Please also include the statement “There was no additional external funding received for this study.” in your updated Funding Statement. Please include your amended Funding Statement within your cover letter. We will change the online submission form on your behalf. 3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: See attached file. This is filler because the poorly designed input page at Editorial Manager (which is run by people who also don't apparently know how to use the HTTP notion of domains or realms to distinguish logins for different journals) insists on at least 200 character even if you upload a separate file with the full review. We'll see if anyone realizes they can ignore this part. Reviewer #2: In the manuscript, the author presents work to integrate chemosensitivity of cancer cell lines with their genomic defects and drug mechanisms of FDA-approved agents. The work is an important and useful study, particularly as medical oncology moves towards a targeted, individualized approach to treat each individual’s cancer with a personalized drug combination. Drug re-purposing and drug combinations, along with treatment sequence, all play into this, and the more information we have about how chemosensitivity, genomic alterations and drug action link together, it may be possible to discover how drugs synergize to create a lethal combination to treat many cancers. Knowing the genetic vulnerabilities of an individual’s tumor type is becoming more possible and cost-efficient with the advances in sequencing technologies. The author extends work by Ikediobi et al. by the inclusion of more screened compounds (~53k vs ~8k) and a larger set of gene mutations (N=365 vs N=24). A novel analysis of self-organizing-maps (SOMs) integrated data from three different sources (NCI60, cBioPortal and CellMiner) and derived links among tumor cell chemosensitivity, genetically defective genes and mechanisms of action (MOA). While this study is limited to the analysis of data from cancer cell lines, the results are applicable and could be beneficial for pre-clinical studies. This is pointed out in the Discussion as cell lines are not fully representative of human tumors. Overall the author could improve the structure of the manuscript for readability. For example, the introduction is very brief and ends abruptly with the bullet point list. This should be expanded. The Methods/Data focuses on describing the boxes in Figure 1. A better approach might be to break this into headings, and then refer to Figure 1, Box A, rather than have Box A as the heading. This is particularly the case when the author goes on to refer to Box B – E. Descriptive headings would be much more useful to the reader. The figure legends should have titles – what is the result of the figure? Then a description of each panel should follow. In several figures, it would be helpful to increase the size and include panel letters. While many figures have multiple panels, they are referred to as left/right or upper/lower etc. Panel letters would improve clarity. Many figure legends are too brief. Expansion of the figure legends to fully explain the figures would improve readability of the manuscript. For someone not directly involved in the field, it would be helpful to define terms, i.e. DTP, NSC, etc. This is also true for gene/proteins, like KRAS, etc., and cancer types (TNBC). There is a misuse of verb tense throughout the manuscript. For example, “Prior analyses find…” – this should be “found”. Many parts are written as if the work is proposed, not completed. Gene names should be italicized (not bold); human proteins in all capitals, non-italicized. This is standard convention and would improve the clarity when moving between defective genes and their resultant proteins and protein activities. In some cases, work of others is referred to without the reference, i.e. in Appendix Figure 2 and Holbeck et al. (in the description for Box C). Figure 2 – it is very difficult to see the drug names that are in white against the colored SOMDTP projections. The MOAs for the drug lists should be specified on the figure. This is listed in the text, but better labelling of the figure would improve clarity. Figure 3 – the x-axis on each plot is very difficult to see. How is the inflection point of gap statistic and within sum of squares method calculated? Was this a calculation or interpreted from the curves? With this and other figures, more details and primary data used to do these analyses should be included. Pg 13 – surely there is a reference for nonlinear dimension reduction that is not Wikipedia? Pg 14/Figure 5 – it would be helpful to have an indication/description of what the cell lines are. This is done for Figure 6; however, it should read, “Here there are 12 tumor cell lines…” not tumor cells. The clinical significance of the results of the study could be expanded. What will be the next steps using the results obtained? For Appendix Figures 3 and 4 – is the result of the defective PI3KR1 and IGFR1 surprising? The headings for Table II could be improved to describe what will follow. More detail could be added to improve readability. Figures 9 – 14 – each right panel displays partial SOMDTP meta-clades, which are shown fully in Figure 4. It would be helpful to refer back to Figure 4 or superimpose the full SOMDTP for context. The author refers to ‘defective’ genes throughout; however, it would be useful to point out that mutations, copy number alterations and/or fusion/splice changes can enable and ‘over-activate’ or enhance gene function. Defective implies lack of function, which is not the case with some of the genes mentioned, particularly in a cancer context. Some specific examples do not fall into this (i.e. p. 27 V600E activating mutation of BRAF); however, the author should clearly delineate the difference between defective/lack-of-function versus altered/gain-of-function. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Kevin R Coombes Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. Submitted filename: review.docx Click here for additional data file. 1 Mar 2021 please see response to reviewers Submitted filename: PONE_reviewer_comments_revised.docx Click here for additional data file. 17 Mar 2021 Bioinformatic Analysis Linking Genomic Defects to Chemosensitivity and Mechanism of Action PONE-D-20-35751R1 Dear Dr. Covell, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Rama Krishna Kancha Academic Editor PLOS ONE Additional Editor Comments (optional): The reviewers approved the revised version for acceptance following the incorporation of suggested changes and addressing their concerns. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: I Don't Know ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #2: As the specific analyses used are outside of my area of expertise, I cannot comment on the statistical methods and models employed. This is my shortcoming and not the author's. Importantly, I am very satisfied with the thoughtful and robust response to all of the reviewers' comments. The manuscript has been completely overhauled, and I think the clarity has vastly improved. The topic is of interest to me to inform pre-clinical studies, and I now believe this comes through more clearly with the presentation of the data, improvement in the description of methods used, better organisation of the results and a more thorough discussion. In my opinion, the author has done very interesting work that will be of benefit to pre-clinical experimental design. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Kevin R. Coombes Reviewer #2: No 22 Mar 2021 PONE-D-20-35751R1 Bioinformatic Analysis Linking Genomic Defects to Chemosensitivity and Mechanism of Action Dear Dr. Covell: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Rama Krishna Kancha Academic Editor PLOS ONE

Table 1

Most frequent primary MOA assignments within meta-clade groups A:G.

Meta-clade Groups	MOA
A: 1–6	PK
B: 7–9	PK, A
C: 10–15	PK, A, D
D: 16–18	Tu, T2
E: 19–20	PK
F: 21–24	A, D, T1
G: 25–28	Apo, A, D

122 in total

1. Spotlight on molecular profiling: "Integromic" analysis of the NCI-60 cancer cell lines.

Authors: John N Weinstein
Journal: Mol Cancer Ther Date: 2006-11-06 Impact factor: 6.261

Review 2. Using CellMiner 1.6 for Systems Pharmacology and Genomic Analysis of the NCI-60.

Authors: William C Reinhold; Margot Sunshine; Sudhir Varma; James H Doroshow; Yves Pommier
Journal: Clin Cancer Res Date: 2015-06-05 Impact factor: 12.531

3. VEGF, flt-1, and KDR/flk-1 as prognostic indicators in endometrial carcinoma.

Authors: B A Fine; P T Valente; G I Feinstein; T Dey
Journal: Gynecol Oncol Date: 2000-01 Impact factor: 5.482

4. Dose-escalation trial of the ALK, MET & ROS1 inhibitor, crizotinib, in patients with advanced cancer.

Authors: Jeffrey W Clark; D Ross Camidge; Eunice L Kwak; Robert G Maki; Geoffrey I Shapiro; Isan Chen; Weiwei Tan; Sophia Randolph; James G Christensen; Mark Ozeck; Yiyun Tang; Keith D Wilner; Ravi Salgia
Journal: Future Oncol Date: 2019-11-28 Impact factor: 3.404

5. Discovery of Janus Kinase 2 (JAK2) and Histone Deacetylase (HDAC) Dual Inhibitors as a Novel Strategy for the Combinational Treatment of Leukemia and Invasive Fungal Infections.

Authors: Yahui Huang; Guoqiang Dong; Huanqiu Li; Na Liu; Wannian Zhang; Chunquan Sheng
Journal: J Med Chem Date: 2018-07-14 Impact factor: 7.446

6. Next-generation characterization of the Cancer Cell Line Encyclopedia.

Authors: Mahmoud Ghandi; Franklin W Huang; Judit Jané-Valbuena; Gregory V Kryukov; Christopher C Lo; E Robert McDonald; Jordi Barretina; Ellen T Gelfand; Craig M Bielski; Haoxin Li; Kevin Hu; Alexander Y Andreev-Drakhlin; Jaegil Kim; Julian M Hess; Brian J Haas; François Aguet; Barbara A Weir; Michael V Rothberg; Brenton R Paolella; Michael S Lawrence; Rehan Akbani; Yiling Lu; Hong L Tiv; Prafulla C Gokhale; Antoine de Weck; Ali Amin Mansour; Coyin Oh; Juliann Shih; Kevin Hadi; Yanay Rosen; Jonathan Bistline; Kavitha Venkatesan; Anupama Reddy; Dmitriy Sonkin; Manway Liu; Joseph Lehar; Joshua M Korn; Dale A Porter; Michael D Jones; Javad Golji; Giordano Caponigro; Jordan E Taylor; Caitlin M Dunning; Amanda L Creech; Allison C Warren; James M McFarland; Mahdi Zamanighomi; Audrey Kauffmann; Nicolas Stransky; Marcin Imielinski; Yosef E Maruvka; Andrew D Cherniack; Aviad Tsherniak; Francisca Vazquez; Jacob D Jaffe; Andrew A Lane; David M Weinstock; Cory M Johannessen; Michael P Morrissey; Frank Stegmeier; Robert Schlegel; William C Hahn; Gad Getz; Gordon B Mills; Jesse S Boehm; Todd R Golub; Levi A Garraway; William R Sellers
Journal: Nature Date: 2019-05-08 Impact factor: 49.962

7. Wild-type H- and N-Ras promote mutant K-Ras-driven tumorigenesis by modulating the DNA damage response.

Authors: Elda Grabocka; Yuliya Pylayeva-Gupta; Mathew J K Jones; Veronica Lubkov; Eyoel Yemanaberhan; Laura Taylor; Hao Hsuan Jeng; Dafna Bar-Sagi
Journal: Cancer Cell Date: 2014-02-10 Impact factor: 31.743

8. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data.

Authors: Ethan Cerami; Jianjiong Gao; Ugur Dogrusoz; Benjamin E Gross; Selcuk Onur Sumer; Bülent Arman Aksoy; Anders Jacobsen; Caitlin J Byrne; Michael L Heuer; Erik Larsson; Yevgeniy Antipin; Boris Reva; Arthur P Goldberg; Chris Sander; Nikolaus Schultz
Journal: Cancer Discov Date: 2012-05 Impact factor: 39.397

Review 9. Discovery of small molecule cancer drugs: successes, challenges and opportunities.

Authors: Swen Hoelder; Paul A Clarke; Paul Workman
Journal: Mol Oncol Date: 2012-03-03 Impact factor: 6.603

Review 10. Targeting oncogenic Myc as a strategy for cancer treatment.

Authors: Hui Chen; Hudan Liu; Guoliang Qing
Journal: Signal Transduct Target Ther Date: 2018-02-23