| Literature DB >> 27398021 |
Florian Boecker1, Horst Buerger2, Nikhil V Mallela3, Eberhard Korsching3.
Abstract
There are no satisfying tools in tissue microarray (TMA) data analysis up to now to analyze the cooperative behavior of all measured markers in a multifactorial TMA approach. The developed tool TMAinspiration is not only offering an analysis option to close this gap but also offering an ecosystem consisting of quality control concepts and supporting scripts to make this approach a platform for informed practice and further research. The TMAinspiration method is specifically focusing on the demands of the TMA analysis by controlling errors and noise by a generalized regression scheme while at the same time avoiding to introduce a priori too many constraints into the analysis of the data. So, we are testing partitions of a proximity table to find an optimal support for a ranking scheme of molecular dependencies. The idea of combining several partitions to one ensemble, which is balancing the optimization process, is based on the main assumption that all these perspectives on the cellular network need to be self-consistent. Several application examples in breast cancer and one in squamous cell carcinoma demonstrate that this procedure is nicely confirming a priori knowledge on the expression characteristics of protein markers, while also integrating many new results discovered in the treasury of a bigger TMA experiment. The code and software are now freely available at: http://complex-systems.uni-muenster.de/tma_inspiration.html.Entities:
Keywords: cancer; combinatorial algorithm; pathology; protein expression; systems biology; tissue microarray
Year: 2016 PMID: 27398021 PMCID: PMC4928646 DOI: 10.4137/CIN.S39112
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1Algorithm – searching for the optimal order explaining best all connected reference situations. The graph describes the core functionality of the algorithm. Top panel: as an example, a set of eight proteins named A–H was measured. Middle panel: the set of proteins is partitioned in two sets also called test and reference. The two groups can be interchanged, but normally the test group will collect less well-characterized proteins, while the reference groups might comprise well-characterized proteins, marking different equilibrium states of a biological system, eg, contrasting differentiation end points, such as CK 5/6 and CK 8/18, in basal and luminal cells in the mammalian gland. Bottom panel: The space of a complete enumeration of all test string permutations of partition 1 is searched for a minimal sum of squares resulting from the generalized regression. The regression is based on the Pearson correlation coefficients of, eg, A–B to A–H.
Note: The red character C should be a visual marker to recognize different orders in the string.
Figure 2Software performance. (A–C) The run time determining step for larger calculations, the computation of the permutations of the test partition is illustrated. The results of the factorial function used to calculate the number of permutations are given for the test partition size from 1 to 17. (A and B) It can be clearly seen that for larger partition sizes, the combinatorial space grows dramatically. (A) A logarithmic scale while (B) shows the linear situation. For the purposes of comparison, a logarithmic growth (green) and a linear growth (blue) are also given. (C) The computational run time as a consequence thereof for the parallelization technology MPI and OpenMP. (D) The computational cost for the tool tins_mpi/omp (all combinations) in dependency from CPU core or pipeline number and parallelization technology. (E–H) The performance values of the tool tins_s_mpi/omp (best order) are presented in dependency from CPU core or pipeline number, resampling number and parallelization technology.
Figure 3Software workflow. The workflow highlights the major process steps. (A) Generating or adopting a multivariate data set, which is based on one and the same TMA block (series), and being discrete or continuous measurements. Selecting the data partitions. (B) Performing the calculations and verifying that the results are reliable, and no warnings were reported in the output files. (C) Importing the results via the provided scripts into the mathematical platform R and analyzing or visualizing the results. A complete test environment for this workflow is provided via the software link.