| Literature DB >> 35606701 |
Eduardo N Castanho1, Helena Aidos1, Sara C Madeira2.
Abstract
BACKGROUND: The effectiveness of biclustering, simultaneous clustering of rows and columns in a data matrix, was shown in gene expression data analysis. Several researchers recognize its potentialities in other research areas. Nevertheless, the last two decades have witnessed the development of a significant number of biclustering algorithms targeting gene expression data analysis and a lack of consistent studies exploring the capacities of biclustering outside this traditional application domain.Entities:
Keywords: Biclustering; Neurosciences; Time series analysis; fMRI
Mesh:
Year: 2022 PMID: 35606701 PMCID: PMC9126639 DOI: 10.1186/s12859-022-04733-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Biclustering search strategies as defined by Madeira and Oliveira [7]
| Category | General characteristics | Examples |
|---|---|---|
| Greedy | Biclusters are generated by adding or removing columns to a initial random bicluster in order to improve some gain function. The final objective is for the algorithm to find a global minimun solution after some iterations. Despite making wrong decisions, and loosing good biclusters due to beeing stuck in local minima, they have the potential of being fast algorithms | ISA XMotifs [ |
| Distribution parameter identification | Assume some statistical model behind the data, and then apply some iterative procedure in order to obtain its parameters by minimizing some criterion | FABIA spectral biclustering [ |
| Divide and conquer | Divide the original data matrix into smaller instances. With the potential of being very fast, they could fail to find good biclusters, splitted before identified | Bimax [ |
| Exhaustive | Based on the premise that finding the best biclusters can only be done by using an exhaustive enumeration of all possible biclusters in the data matrix. Despite being able to find the bests biclusters they do it by imposing restrictions to the biclusters size (since these algorithms are typically very slow) | BicPAM CCC [ |
Fig. 1Correlation time series between two non-adjacent brain regions. Biclustering would be able to detect precisely these types of correlation patterns, while ignoring the non-correlated regions and time points. This allow to obtain more flexible structure than traditional clustering (Figure adapted from [1])
Fig. 2Whole brain time series. In the heatmap, some row and column clusters are visible, as well with some events that happen only for some regions for some specific time points. This later type of structures are not detected by traditional clustering tasks (Figure adapted from [1])
Fig. 3General methodology followed by comparison studies
Data collections used to evaluate the performance of the biclustering algorithms. We have a total of 42 datasets, each one representing one brain scan
| Data collection name | Nature | #Brain scans | #Time points | # Regions | References |
|---|---|---|---|---|---|
| “First data collection” | Artificial | 1 | 150 | [ | |
| “Artificial data” | Artificial | 20 | 150 | [ | |
| “Real data” | Real | 20 | 94 | 463 | [ |
| “Illustrative data” | Real | 1 | 137 | 45 | [ |
Biclustering algorithms considered for this study. Additionally, they will be compared to three popular clustering algorithms: k-means, spectral, and ward’s hierarchical methods. For clustering, we use scikit-learn implementations [64]
| Algorithm | Type of search | Available at | References | Reason to choose it |
|---|---|---|---|---|
| BicPAM | Exhaustive | BicPAMS | [ | State-of-the-art pattern mining based biclustering method |
| CCC | Exhaustive | BiGGEsTS | [ | Allows to obtain temporal contiguous biclusters efficiently |
| ISA | Greedy | isa2 | [ | State-of-the-art greedy algorithm able to deal with real data |
| XMotifs | Greedy | biclust | [ | State-of-the-art greedy algorithm based on a strategy of discretizating data |
| Bimax | Divide and conquer | biclust | [ | Very fast algorithm able to detect simple structures |
| FABIA | Distribution parameter identification | FABIA | [ | State-of-the-art algorithm |
| Spectral Biclustering | Distribution parameter identification | biclust | [ | State-of-the-art algorithm able to detect a specific type of bicluster structures |
Fig. 4Typically, biclustering algorithms require some preprocessing step (either a normalization or a discretization step). For a fair comparison between multiple biclustering algorithms, a post-processing is done to garantee that the bicluster has the original values present in the original data matrix
Median values for the four selected measures for the first artificial dataset, with uncertainties given by the standard deviation (except for the case of SMSR were the standard deviation is orders of magnitude higher than the median value). From these results it is visible that A) The high values of uncertainty discourage focus on optimizing the biclustering method parameters and B) Choosing the right evaluation metric is important, however in most of the biclusters cases they seem to agree for the same “best” configuration. Bold represents the choosen parameters for the next sections
| Method | Configuration | VAR | MSR | SMSR | VE |
|---|---|---|---|---|---|
| BicPAM | 2.49 | ||||
| Constant version | 1.96 | ||||
| Multiplicative | 3.07 | ||||
| Bimax | 10 biclusters | 0.03 | |||
| 100 biclusters | 0.05 | ||||
| 1000 biclusters | 0.06 | ||||
| 0.06 | |||||
| 100,000 Biclusters | 0.06 | ||||
| CCC | 0.51 | ||||
| Variation between time points (2 Symbols) | 0.93 | ||||
| Variation between time points (3 Symbols) | 0.93 | ||||
| FABIA | 0.51 | ||||
| Relaxed | 3485.60 | ||||
| Spectral Biclustering | 670.30 | ||||
| bistochastization | 3232.21 | ||||
| irrc | 1271.71 | ||||
| XMotifs | Discretization with 2 symbols | 1168.76 | |||
| 6.59 |
Fig. 5Comparing the general capacity of biclustering algorithms versus the two clustering variants in artificial and real data. This is done by aggregating the results from all algorithms and comparing the median value of the metric. The results motivate the capacity of biclustering to obtain promising results for analysing the data
Fig. 6Virtual Error measure for every tested algorithm in our artificial data collection. Despite having great oscillations, the median performance of the exhaustive approaches (CCC and BicPAM) show promising results in comparison with the remaining biclustering approaches
Fig. 7Virtual Error measure for every tested algorithm in our real data collection. Despite the biclustering algorithms not being indisputable better than the traditional clustering, the use of exhaustive biclustering approaches such as CCC and BicPAM show a good capacity of generating coherent biclusters
Fig. 8Comparison between the best generated biclusters for each biclustering algorithm. From the previous biclustering solution, the top-k biclusters (filtering by the virtual error and removing biclusters with virtual error smaller than 0.01). The results follow the previous conclusions pointing for a high capacity of the exhaustive algorithms to generate good biclusters. Additionally the ISA results suggest that, while is general performance is bad, it does have the capability of generating some good biclusters
Median values (and associated standard deviation) for the typical bicluster dimension parameters in both data collections: number of regions in each bicluster, number of time points and bicluster area. When comparing this results to the virtual error values, a apparent relation comes between the bicluster size and the associated virtual error, which make sense
| Algorithms | Artificial data | Real data | ||||
|---|---|---|---|---|---|---|
| Time points | Region points | Area | Time points | Region points | Area | |
| Bimax | ||||||
| BicPAM | ||||||
| CCC | ||||||
| FABIA | ||||||
| Spectral Biclustering | ||||||
| ISA | ||||||
| XMotifs | ||||||
| kmeans | 150 | 94 | ||||
| pectral | 150 | 94 | ||||
| ward | 150 | 94 | ||||
| kmeans | 26 | 463 | ||||
| spectral | 26 | 463 | ||||
| ward | 26 | 463 | ||||
Correlation between the virtual error and the three specific coherence measures: Variance (constant biclusters), MSR (shifting biclusters) and SMSR (scaling biclusters). Most of the algorithms agree that the expected patterns are of shifting nature
| Algorithms | Artificial Data | Real Data | Type of Pattern | ||||
|---|---|---|---|---|---|---|---|
| Variance | MSR | SMSR | Variance | MSR | SMSR | ||
| BicPAM | 0.009 | 0.000 | 0.040 | 0.000 | Shifting | ||
| Bimax | 0.003 | 0.037 | 0.001 | 0.000 | Constant/scaling | ||
| CCC | 0.007 | 0.000 | 0.033 | 0.000 | Constant | ||
| FABIA | 0.051 | 0.061 | 0.003 | 0.001 | Shifting | ||
| ISA | 0.050 | 0.000 | 0.040 | 0.000 | Shifting | ||
| Spectral biclustering | 0.009 | 0.000 | 0.103 | 0.000 | Shifting | ||
| XMotifs | 0.115 | 0.016 | 0.003 | 0.000 | Shifting | ||
Fig. 9Heatmap of the illustrative data. Interaction between brain regions is local: some brain regions interact together in some time points. Traditional clustering analysis is not able to automatically discover these structures
Fig. 10First example bicluster
Fig. 11Second example bicluster
Fig. 12Third example bicluster
Fig. 13Diferentes between Clustering and Biclustering. While clustering methods allow to obtain only disjoint strips in the data matrix, biclustering finds more flexible structures (Figure adapted from [6])
Fig. 14Differences between ICA and Biclustering. While ICA decomposes the original matrix, biclustering generates an arbitrary number of sub-matrices (depending on the algorithm)
Fig. 15Relation between biclustering and graph theory: a biclustering can be seen as a submodule in a network
Fig. 16After discovering the biclusters for a group of subjects, a data matrix can be obtained locating the presence of some bicluster in a subject, and then used for classification tasks. The biclusters (sets of features and corresponding representative values) are used as features (Figure adapted from [112])