| Literature DB >> 36123637 |
He-Ming Chu1, Jin-Xing Liu1, Ke Zhang2, Chun-Hou Zheng1, Juan Wang1, Xiang-Zhen Kong3.
Abstract
Biclustering algorithm is an effective tool for processing gene expression datasets. There are two kinds of data matrices, binary data and non-binary data, which are processed by biclustering method. A binary matrix is usually converted from pre-processed gene expression data, which can effectively reduce the interference from noise and abnormal data, and is then processed using a biclustering algorithm. However, biclustering algorithms of dealing with binary data have a poor balance between running time and performance. In this paper, we propose a new biclustering algorithm called the Adjacency Difference Matrix Binary Biclustering algorithm (AMBB) for dealing with binary data to address the drawback. The AMBB algorithm constructs the adjacency matrix based on the adjacency difference values, and the submatrix obtained by continuously updating the adjacency difference matrix is called a bicluster. The adjacency matrix allows for clustering of gene that undergo similar reactions under different conditions into clusters, which is important for subsequent genes analysis. Meanwhile, experiments on synthetic and real datasets visually demonstrate that the AMBB algorithm has high practicability.Entities:
Keywords: Adjacency matrix; Biclustering; Binary data; Gene expression data
Mesh:
Year: 2022 PMID: 36123637 PMCID: PMC9484244 DOI: 10.1186/s12859-022-04842-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1A schematic of the biclustered clusters obtained for each row threshold. a The number of biclusters result of the synthetic dataset of 50*50. b The number of biclusters result of 100*100 synthetic dataset. c The number of biclusters result of 200*200 synthetic dataset
Fig. 2A brief schematic of the AMBB algorithm. a shows the pre-processing matrix. b shows the selection of seeds. c shows the construction of the difference matrix, and (d) shows the acquisition of biclusters
Fig. 3The diagram of two methods for two obtained seed methods of AMBB. Methods a and method b show two methods of obtaining seeds
AMBB biclustering algorithm
Fig. 4Comparison of methods for synthetic datasets. a–c show the comparison of two AMBB methods. d–f show the comparison with four biclustering algorithms
Fig. 5Time comparisons of the four methods are available
Fig. 6Three bicluster patterns: a Shift pattern with non-noise. b Shift pattern with noise. c Scale pattern with non-noise. d Scale pattern with noise. e Shift-scale pattern with non-noise. f Shift-scale with noise
The value of parameters in synthetic dataset for four biclustering algorithm
| Pattern | Type | AMBB | Bimax | BiBit | QUBIC |
|---|---|---|---|---|---|
| Shift-pattern | Non-noise | m = 5 | m = 5 | c = 0.55 | |
| Noise | m = 5 | m = 5 | c = 0.55 | ||
| Scale-pattern | Non-noise | m = 4 | m = 4 | c = 0.85 | |
| Noise | m = 6 | m = 10 | c = 0.55 | ||
| Shift-scale pattern | Non-noise | m = 4 | m = 4 | c = 0.85 | |
| Noise | m = 7 | m = 10 | c = 0.45 | ||
| Overlapping | Non-noise (2 × 5) | m = 6 | m = 2 | c = 0.85 | |
| Noise (2 × 5) | m = 2 | m = 6 | c = 0.75 | ||
| Non-noise (5 × 5) | m = 7 | m = 10 | c = 0.75 | ||
| Noise (5 × 5) | m = 4 | m = 2 | c = 0.65 |
Fig. 7The results of overlapping bicluster experiments: a Non-noise data matrix with overlapping biclusters of size 2 × 5. b Noise data matrix with overlapping biclusters of size 2 × 5. c Non-noise data matrix with overlapping biclusters of size 5 × 5. d Noise data matrix with overlapping biclusters of size 5 × 5
The information of datasets
| Dataset | Number of gene | Number of condition | Specie |
|---|---|---|---|
| Pollen | 14,805 | 80 | Homo |
| Buettner | 8989 | 182 | Mouse |
| GDS3715 | 4697 | 94 | Homo |
| GSE7904 | 21,653 | 62 | Homo |
The enrichment results of the four algorithms in the four real datasets
| Dataset | Method | Ratio | Count | BP | CC | MF |
|---|---|---|---|---|---|---|
| Term ( | Term ( | Term (P-value) | ||||
| Pollen | AMBB | 0.7629 | 621/814 | |||
| Bimax | 0.75 | 108/144 | integrin-mediated signaling pathway (1.50E-05) | Membrane (1.00E-07) | ATP binding (1.50E-15) | |
| Plaid | 34/36 | cellular lipid metabolic process (1.60E-06) | Membrane (1.30E-06) | ATPase activity, coupled to transmembrane movement of substances (3.70E-18) | ||
| QUBIC | 0.8378 | 124/148 | acyl-CoA metabolic process (9.40E-15) | mitochondrial matrix (1.30E-07) | metalloendopeptidase activity (1.10E-18) | |
| Buettner | AMBB | 518/704 | ||||
| Bimax | 0.6222 | 28/45 | Ossification (1.50E-04) | Nucleus (2.00E-04) | transmembrane transporter activity (9.50E-03) | |
| Plaid | 0.4848 | 16/33 | Ossification (4.90E-03) | Nucleus (1.90E-03) | nucleotidyltransferase activity (3.00E-02) | |
| QUBIC | 0.7321 | 46/51 | negative regulation of apoptotic process (2.70E-03) | cytoplasmic vesicle (1.50E-03) | double-stranded RNA binding (8.00E-03) | |
| GDS3715 | AMBB | 0.735 | 294/400 | |||
| Bimax | 0.7769 | 94/121 | inorganic anion transport (2.10E-08) | Membrane (3.40E-07) | ATPase activity, coupled to transmembrane movement of substances (9.10E-22) | |
| Plaid | 0.7736 | 41/53 | transmembrane transport (8.60E-11) | ATP-binding cassette (ABC) transporter complex (1.50E-05) | ATPase activity, coupled to transmembrane movement of substances (9.70E-16) | |
| QUBIC | 55/68 | transmembrane transport (2.20E-11) | ATP-binding cassette (ABC) transporter complex (2.25E-05) | ATPase activity, coupled to transmembrane movement of substances (5.90E-17) | ||
| GSE3904 | AMBB | 0.8723 | 41/47 | structural constituent of ribosome (1.70E-38) | ||
| Bimax | 54/57 | transmembrane transport (2.60E-13) | intracellular membrane-bounded organelle (3.00E-04) | ATPase activity, coupled to transmembrane movement of substances (7.50E-25) | ||
| Plaid | 0.9091 | 30/33 | transmembrane transport (8.20E-22) | integral component of membrane (1.40E-05) | ATPase activity, coupled to transmembrane movement of substances (8.70E-29) | |
| QUBIC | 0.828 | 77/93 | transmembrane transport (2.10E-33) | integral component of membrane (1.60E-07) |