| Literature DB >> 31795929 |
Vincent Branders1, Pierre Schaus2, Pierre Dupont2.
Abstract
BACKGROUND: Transcriptome analysis aims at gaining insight into cellular processes through discovering gene expression patterns across various experimental conditions. Biclustering is a standard approach to discover genes subsets with similar expression across subgroups of samples to be identified. The result is a set of biclusters, each forming a specific submatrix of rows (e.g. genes) and columns (e.g. samples). Relevant biclusters can, however, be missed when, due to the presence of a few outliers, they lack the assumed homogeneity of expression values among a few gene/sample combinations. The Max-Sum SubMatrix problem addresses this issue by looking at highly expressed subsets of genes and of samples, without enforcing such homogeneity.Entities:
Keywords: Biclustering; Gene enrichment analysis; Gene expression analysis; Identification of significant GO terms
Mesh:
Year: 2019 PMID: 31795929 PMCID: PMC6888937 DOI: 10.1186/s12859-019-3289-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Toy example. a Illustration of a full, normalized, matrix, b the associated submatrix of maximal sum, c bicluster returned by CCA and d bicluster return by ISA
Total number of identified and enriched biclusters
| Algorithm | Biclusters | Enriched biclusters |
|---|---|---|
| CCA | 349 | 108 |
| ISA | 163 | 90 |
| K-CPGC | 342 | |
| Plaid | 102 | 57 |
| QUBIC | 269 | 107 |
| Spectral | 147 | 44 |
| xMOTIFs | 309 | 60 |
| CPGC | 35 | 35 |
Results reported for each algorithm on the 35 gene expression datasets from human tissues and Saccharomyces cerevisae. The defined target is K=10 biclusters for each dataset, for a maximum of 350 biclusters overall. A bicluster is considered significantly enriched if the subset of genes it contains is associated to at least one GO term with an FDR corrected p-value below 5%
Number of enriched biclusters found by each algorithm on each dataset
| dataset | CCA | ISA | K-CPGC | Plaid | QUBIC | Spectral | xMOTIFs | CPGC |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 (5.0) | 6 (2.0) | 1 (5.0) | 0 (7.5) | 0 (7.5) | 2 (3.0) | 1 (5.0) | |
| 2 | 0 (7.5) | 6 (2.0) | 1 (5.0) | 1 (5.0) | 0 (7.5) | 2 (3.0) | 1 (5.0) | |
| 3 | 1 (6.0) | 7 (2.0) | 1 (6.0) | 5 (3.0) | 0 (8.0) | 3 (4.0) | 1 (6.0) | |
| 4 | 1 (5.5) | 2 (2.5) | 1 (5.5) | 2 (2.5) | 0 (8.0) | 1 (5.5) | 1 (5.5) | |
| 5 | 2 (4.0) | 2 (4.0) | 1 (6.5) | 0 (8.0) | 2 (4.0) | 1 (6.5) | ||
| 6 | 1 (5.5) | 5 (2.0) | 2 (4.0) | 0 (7.5) | 0 (7.5) | 3 (3.0) | 1 (5.5) | |
| 7 | 2 (5.0) | 3 (4.0) | 1 (7.0) | 7 (2.5) | 7 (2.5) | 1 (7.0) | 1 (7.0) | |
| 8 | 3 (5.0) | 1 (7.5) | 6 (3.0) | 5 (4.0) | 2 (6.0) | 1 (7.5) | ||
| 9 | 1 (7.0) | 2 (4.5) | 1 (7.0) | 6 (2.5) | 6 (2.5) | 2 (4.5) | 1 (7.0) | |
| 10 | 1 (5.0) | 1 (5.0) | 1 (5.0) | 0 (8.0) | 2 (2.0) | 1 (5.0) | 1 (5.0) | |
| 11 | 0 (7.5) | 1 (4.5) | 0 (7.5) | 1 (4.5) | 1 (4.5) | 1 (4.5) | ||
| 12 | 2 (3.5) | 7 (2.0) | 0 (7.5) | 2 (3.5) | 0 (7.5) | 1 (5.5) | 1 (5.5) | |
| 13 | 0 (7.0) | 0 (7.0) | 2 (3.5) | 0 (7.0) | 2 (3.5) | 1 (5.0) | ||
| 14 | 2 (4.5) | 3 (2.5) | 1 (6.5) | 0 (8.0) | 2 (4.5) | 3 (2.5) | 1 (6.5) | |
| 15 | 2 (4.5) | 0 (8.0) | 2 (4.5) | 1 (6.5) | 1 (6.5) | |||
| 16 | 1 (5.5) | 0 (7.5) | 4 (3.0) | 0 (7.5) | 2 (4.0) | 1 (5.5) | ||
| 17 | 0 (7.0) | 1 (4.5) | 0 (7.0) | 0 (7.0) | 1 (4.5) | |||
| 18 | 2 (2.0) | 1 (4.0) | 0 (7.0) | 1 (4.0) | 0 (7.0) | 0 (7.0) | 1 (4.0) | |
| 19 | 8 (2.5) | 0 (7.0) | 6 (4.0) | 8 (2.5) | 0 (7.0) | 0 (7.0) | 1 (5.0) | |
| 20 | 6 (2.0) | 3 (4.5) | 3 (4.5) | 4 (3.0) | 0 (8.0) | 2 (6.0) | 1 (7.0) | |
| 21 | 2 (4.0) | 1 (6.5) | 4 (2.0) | 2 (4.0) | 0 (8.0) | 2 (4.0) | 1 (6.5) | |
| 22 | 6 (2.0) | 1 (6.5) | 0 (8.0) | 3 (4.5) | 5 (3.0) | 3 (4.5) | 1 (6.5) | |
| 23 | 2 (2.5) | 0 (7.0) | 0 (7.0) | 2 (2.5) | 0 (7.0) | 1 (4.5) | 1 (4.5) | |
| 24 | 0 (6.5) | 0 (6.5) | 0 (6.5) | 0 (6.5) | 1 (3.5) | 1 (3.5) | ||
| 25 | 0 (7.0) | 3 (3.0) | 0 (7.0) | 0 (7.0) | 1 (4.5) | 1 (4.5) | ||
| 26 | 0 (7.5) | 2 (4.5) | 2 (4.5) | 0 (7.5) | 3 (3.0) | 1 (6.0) | ||
| 27 | 1 (6.5) | 3 (2.0) | 2 (4.0) | 2 (4.0) | 0 (8.0) | 2 (4.0) | 1 (6.5) | |
| 28 | 0 (7.0) | 4 (2.0) | 2 (3.5) | 2 (3.5) | 0 (7.0) | 0 (7.0) | 1 (5.0) | |
| 29 | 3 (3.5) | 1 (6.0) | 3 (3.5) | 4 (2.0) | 0 (8.0) | 1 (6.0) | 1 (6.0) | |
| 30 | 0 (7.5) | 2 (2.5) | 1 (5.0) | 1 (5.0) | 0 (7.5) | 2 (2.5) | 1 (5.0) | |
| 31 | 4 (2.5) | 1 (5.5) | 4 (2.5) | 1 (5.5) | 0 (8.0) | 1 (5.5) | 1 (5.5) | |
| 32 | 0 (7.5) | 5 (4.5) | 7 (3.0) | 0 (7.5) | 5 (4.5) | 1 (6.0) | ||
| 33 | 6 (3.0) | 0 (8.0) | 7 (2.0) | 3 (4.0) | 1 (6.0) | 1 (6.0) | 1 (6.0) | |
| 34 | 6 (2.0) | 0 (7.5) | 4 (3.0) | 3 (4.5) | 0 (7.5) | 3 (4.5) | 1 (6.0) | |
| 35 | 0 (6.5) | 5 (2.0) | 2 (3.0) | 0 (6.5) | 0 (6.5) | 0 (6.5) | 1 (4.0) | |
| avg. rank | 3.7 | 4.5 | 5.4 | 3.9 | 6.2 | 4.7 | 5.6 |
Numbers in parentheses are the associated ranks. In case of ties, average ranks are assigned. The last row corresponds to the algorithm ranks averaged over the 35 datasets. Best performances are highlighted in bold. It is observed that all enriched biclusters have different GO enrichment. Note that CPGC is the original algorithm identifying a single submatrix of maximal sum per dataset
Fig. 2Critical difference of ranks. Comparison between the average rank of each algorithm over N=35 datasets, with a 5% level of significance and Hochberg’s correction for multiple testing
Fig. 3Comparison of K-CPGC and CCAp-values for enriched GO terms. This figure presents the (logarithmic) ratio of corrected p-values associated to each GO term identified by both K-CPGC and CCA on all the 35 datasets. Positive values (638 GO terms) are in favor of K-CPGC
Fig. 4Comparison of terms enriched by K-CPGC and CCA for each of the 35 datasets. a Number of terms enriched by K-CPGC and CCA for each of the 35 datasets. The horizontal line reports the number of terms that are enriched by both approaches. b Number of times that the adjusted p-value of a term found by an algorithm is smaller than the adjusted p-value of the term found by the other algorithm. Only terms enriched by both algorithms are considered
Fig. 5Search tree. This figure illustrates the search tree defined on the set of possible submatrices. A question mark refers to an unbound variable that can be equal to 0 or 1
Data collection summary
| Name | Chip | Genes | Samples | Organism | Tissue/Condition | |
|---|---|---|---|---|---|---|
| 1 | armstrong-v1 | Affy | 1081 | 72 | Human | Blood |
| 2 | armstrong-v2 | Affy | 2194 | 72 | Human | Blood |
| 3 | bhattacharjee | Affy | 1543 | 203 | Human | Lung |
| 4 | chowdary | Affy | 182 | 104 | Human | Breast, Colon |
| 5 | dyrskjot | Affy | 1203 | 40 | Human | Bladder |
| 6 | gordon | Affy | 1626 | 181 | Human | Lung |
| 7 | laiho | Affy | 2202 | 37 | Human | Colon |
| 8 | nutt-v1 | Affy | 1377 | 50 | Human | Brain |
| 9 | nutt-v2 | Affy | 1070 | 28 | Human | Brain |
| 10 | nutt-v3 | Affy | 1152 | 22 | Human | Brain |
| 11 | pomeroy-v1 | Affy | 857 | 34 | Human | Brain |
| 12 | pomeroy-v2 | Affy | 1379 | 42 | Human | Brain |
| 13 | ramaswamy | Affy | 1363 | 190 | Human | Multi-tissue |
| 14 | shipp | Affy | 798 | 77 | Human | Blood |
| 15 | singh | Affy | 339 | 102 | Human | Prostate |
| 16 | su | Affy | 1571 | 174 | Human | Multi-tissue |
| 17 | west | Affy | 1198 | 49 | Human | Breast |
| 18 | yeoh-v1 | Affy | 2526 | 248 | Human | Bone marrow |
| 19 | alpha factor | cDNA | 1099 | 18 | Yeast | Cell cycle synchronisation |
| 20 | cdc 15 | cDNA | 1086 | 24 | Yeast | Cell cycle synchronisation |
| 21 | cdc 28 | cDNA | 1044 | 17 | Yeast | Cell cycle synchronisation |
| 22 | elutriation | cDNA | 935 | 14 | Yeast | Cell cycle synchronisation |
| 23 | 1mM menadione | cDNA | 1050 | 9 | Yeast | Environmental modifications |
| 24 | 1M sorbitol | cDNA | 1030 | 7 | Yeast | Environmental modifications |
| 25 | 15mM diamide | cDNA | 1038 | 8 | Yeast | Environmental modifications |
| 26 | 25mM DTT | cDNA | 991 | 8 | Yeast | Environmental modifications |
| 27 | constant 32nM H2O2 | cDNA | 976 | 10 | Yeast | Environmental modifications |
| 28 | diauxic shift | cDNA | 1016 | 7 | Yeast | Environmental modifications |
| 29 | complete DTT | cDNA | 962 | 7 | Yeast | Environmental modifications |
| 30 | heat shock 1 | cDNA | 988 | 8 | Yeast | Environmental modifications |
| 31 | heat shock 2 | cDNA | 999 | 7 | Yeast | Environmental modifications |
| 32 | nitrogen depletion | cDNA | 1011 | 10 | Yeast | Environmental modifications |
| 33 | YPD 1 | cDNA | 1011 | 12 | Yeast | Environmental modifications |
| 34 | YPD 2 | cDNA | 1022 | 10 | Yeast | Environmental modifications |
| 35 | Yeast sporulation | cDNA | 1006 | 7 | Yeast | Sporulation |