| Literature DB >> 20418340 |
Sepp Hochreiter1, Ulrich Bodenhofer, Martin Heusel, Andreas Mayr, Andreas Mitterecker, Adetayo Kasim, Tatsiana Khamiakova, Suzy Van Sanden, Dan Lin, Willem Talloen, Luc Bijnens, Hinrich W H Göhlmann, Ziv Shkedy, Djork-Arné Clevert.
Abstract
MOTIVATION: Biclustering of transcriptomic data groups genes and samples simultaneously. It is emerging as a standard tool for extracting knowledge from gene expression measurements. We propose a novel generative approach for biclustering called 'FABIA: Factor Analysis for Bicluster Acquisition'. FABIA is based on a multiplicative model, which accounts for linear dependencies between gene expression and conditions, and also captures heavy-tailed distributions as observed in real-world transcriptomic data. The generative framework allows to utilize well-founded model selection methods and to apply Bayesian techniques.Entities:
Mesh:
Year: 2010 PMID: 20418340 PMCID: PMC2881408 DOI: 10.1093/bioinformatics/btq227
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The outer product λ of two sparse vectors results in a matrix with a bicluster. Note that the non-zero entries in the vectors are adjacent to each other for visualization purposes only.
Results on the 100 simulated datasets
| Method | Score | Method | Score |
| 0.006 (5e-5) | |||
| 0.002 (6e-5) | |||
| 0.057 (2e-3) | 0.004 (2e-4) | ||
| 0.045 (9e-4) | 0.001 (7e-6) | ||
| 0.072 (4e-4) | 0.046 (5e-3) | ||
| 0.083 (6e-4) | 0.037 (4e-3) | ||
| 0.333 (5e-2) | 0.006 (3e-5) | ||
| 0.299 (6e-2) | 0.032 (5e-4) | ||
| 0.188 (4e-2) | 0.011 (5e-4) | ||
| 0.012 (1e-4) |
The numbers denote average consensus scores with the true biclusters as defined in Section 6.1 (standard deviations in parentheses). The best results are highlighted in bold and the second best in italics (‘better’ means significantly better according to both a paired t-test and a McNemar test of correct elements in biclusters).
Fig. 2.An example of FABIA model selection. The data have 10 true biclusters. We have trained the model with 13 biclusters. Only for visualization purposes, the biclusters are generated as contiguous blocks. Top: data (left) and noise-free data (right). Middle: factors . Bottom: data reconstructed by the FABIA model as Λ (left) and loadings Λ (right). The lines indicate three biclusters and connect each bicluster in the reconstructed data with its corresponding factors (middle) and loadings (bottom right).
Results on the breast cancer, multiple tissue samples, DLBCL datasets measured by the consensus score from Section 6.1
| Breast cancer | Multiple tissues | DLBCL | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | Score | #bc | #g | #s | Score | #bc | #g | #s | Score | #bc | #g | #s |
| 3 | 92 | 31 | 0.53 | 5 | 356 | 29 | 2 | 59 | 62 | |||
| 3 | 144 | 32 | 0.44 | 5 | 435 | 30 | 2 | 104 | 60 | |||
| 0.17 | 5 | 87 | 24 | 0.31 | 5 | 431 | 24 | 0.18 | 5 | 50 | 42 | |
| 5 | 500 | 38 | 0.56 | 5 | 1903 | 35 | 0.30 | 5 | 339 | 72 | ||
| 5 | 175 | 38 | 0.50 | 5 | 571 | 42 | 0.28 | 5 | 143 | 63 | ||
| 0.29 | 5 | 56 | 29 | 0.23 | 5 | 71 | 26 | 0.21 | 5 | 68 | 47 | |
| 5 | 796 | 35 | 5 | 3711 | 31 | 0.28 | 5 | 389 | 68 | |||
| 0.34 | 5 | 194 | 35 | 5 | 583 | 34 | 0.27 | 5 | 95 | 61 | ||
| 0.16 | 5 | 5 | 26 | 0.20 | 5 | 11 | 25 | 0.18 | 5 | 4 | 68 | |
| 0.03 | 25 | 55 | 4 | 0.05 | 29 | 230 | 6 | 0.01 | 56 | 26 | 8 | |
| 0.25 | 2 | 466 | 42 | 0.37 | 3 | 1904 | 28 | 0.22 | 1 | 267 | 74 | |
| 0.22 | 1 | 742 | 33 | 0.35 | 3 | 2856 | 28 | 0.18 | 2 | 385 | 58 | |
| 0.04 | 12 | 172 | 8 | 0.04 | 19 | 643 | 12 | 0.03 | 6 | 162 | 4 | |
| 0.02 | 38 | 37 | 7 | 0.03 | 59 | 53 | 8 | 0.02 | 38 | 19 | 15 | |
| 0.01 | 79 | 33 | 8 | 0.01 | 128 | 53 | 9 | 0.01 | 70 | 18 | 14 | |
| 0.07 | 5 | 61 | 6 | 0.11 | 5 | 628 | 6 | 0.05 | 5 | 9 | 9 | |
| 0.01 | 1 | 1213 | 97 | 0.10 | 4 | 35 | 5 | 0.07 | 5 | 73 | 5 | |
| 0.11 | 5 | 12 | 12 | nc | nc | nc | nc | 0.05 | 5 | 10 | 10 | |
| 0.24 | 2 | 40 | 23 | 0.38 | 5 | 255 | 22 | 0.17 | 1 | 3 | 44 | |
| 0.23 | 2 | 24 | 20 | 0.39 | 5 | 274 | 24 | 0.11 | 3 | 6 | 24 | |
| 0.12 | 13 | 198 | 28 | 0.37 | 5 | 395 | 20 | 0.05 | 28 | 133 | 32 | |
| 0.07 | 14 | 77 | 22 | 0.21 | 1 | 117 | 39 | 0.08 | 8 | 82 | 44 | |
| 0.04 | 5 | 343 | 5 | nc | nc | nc | nc | 0.03 | 5 | 167 | 5 | |
An ‘nc’ entry means that the method did not converge for this dataset. The best results are in bold and the second best in italics (again ‘better’ means significantly better according to a paired t-test). The columns ‘#bc’, ‘#g’ and ‘#s’ provide the numbers of biclusters, their average numbers of genes and their average numbers of samples, respectively.