| Literature DB >> 30958184 |
Anna Seigal1, Mariano Beguerisse-Díaz2, Birgit Schoeberl3, Mario Niepel4, Heather A Harrington2.
Abstract
We introduce a tensor-based clustering method to extract sparse, low-dimensional structure from high-dimensional, multi-indexed datasets. This framework is designed to enable detection of clusters of data in the presence of structural requirements which we encode as algebraic constraints in a linear program. Our clustering method is general and can be tailored to a variety of applications in science and industry. We illustrate our method on a collection of experiments measuring the response of genetically diverse breast cancer cell lines to an array of ligands. Each experiment consists of a cell line-ligand combination, and contains time-course measurements of the early signalling kinases MAPK and AKT at two different ligand dose levels. By imposing appropriate structural constraints and respecting the multi-indexed structure of the data, the analysis of clusters can be optimized for biological interpretation and therapeutic understanding. We then perform a systematic, large-scale exploration of mechanistic models of MAPK-AKT crosstalk for each cluster. This analysis allows us to quantify the heterogeneity of breast cancer cell subtypes, and leads to hypotheses about the signalling mechanisms that mediate the response of the cell lines to ligands.Entities:
Keywords: algebra; data clustering; model selection and parameter inference; signalling networks; systems biology; tensors
Mesh:
Substances:
Year: 2019 PMID: 30958184 PMCID: PMC6408352 DOI: 10.1098/rsif.2018.0661
Source DB: PubMed Journal: J R Soc Interface ISSN: 1742-5662 Impact factor: 4.118
Figure 1.Schematic of the constrained tensor clustering method and model identification. (a) The complete set of experiments can be represented by the multi-indexed tensor Z; see §3. (b) The similarity scores between experiments (each cell line/ligand combination) can be stored in a similarity matrix that can be used to construct a similarity tensor S, or to find a preliminary clustering of the data W that may not comply with the constraints. (c) Structured clustering via integer programming. The starting point can be either the similarity tensor S or the pre-existing clustering W. The possible clusterings are represented by points on the grid. The red line is the value of the objective function (equations (4.2) and (4.3)). The best integer value (orange point) is found inside the convex feasible region (blue). (d) A large-scale search for mechanistic models for each cluster involves parametrizing, and ranking the best ODE models for each cluster. (Online version in colour.)
Figure 2.Examples of cluster shapes that are allowed and not allowed in our analysis of breast cancer data. The clusters in the first two columns are all rectangular, and thus allowed under our interpretability framework. The third column contains examples of non-rectangular clusters that are not allowed in our framework. Note that j is not necessarily equal to i + 1, and k is not necessarily h + 1. (Online version in colour.)
Figure 3.Tensor-based structured clustering. (a) TNBC clustering with no prior clustering information. (b) HR+ clustering with no prior information. (c) Clustering of all cell lines starting from an initial partition into three clusters. (d) Clustering from an initial partition into five clusters. Note that the colours on the grid represent clustering assignments, and are not reflective of the intensity of any single parameter. (Online version in colour.)
Figure 4.The top four models for each cluster according to the AICc ranking. The strength of interaction is indicated by the size of the arrow. The grey arrows indicate a blocking mechanism for inhibition. Black inhibition arrows indicate a removal mechanism for inhibition. (Online version in colour.)