| Literature DB >> 29610579 |
Juan A Nepomuceno1, Alicia Troncoso2, Isabel A Nepomuceno-Chamorro1, Jesús S Aguilar-Ruiz2.
Abstract
BACKGROUND: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure.Entities:
Keywords: Biclustering of gene expression data; Gene pairwise GO measures; Scatter search metaheuristic
Year: 2018 PMID: 29610579 PMCID: PMC5872503 DOI: 10.1186/s13040-018-0165-9
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Human datasets related to cancer clinical data used in the experimental study
| Dataset | Size | Information about the experimental context of data |
|---|---|---|
| GDS3289 | (971 ×104) | A prostate cancer study of the disease progression from beginning epithelium |
| to metastatic stage. | ||
| GDS2415 | (1690 ×59) | A breast carcinoma tumor study in patients with breast-conserving therapy. |
| GDS2918 | (4587 ×20) | A study of blood plasma from patients with colorectal cancer. |
| GDS3966 | (10296 ×83) | An analysis of melanoma samples in different stages of the disease. |
| GDS3139 | (12270 ×29) | A histological analysis of normal breast epithelia in patients with breast cancer. |
| GDS4794 | (16925 ×65) | A lung cancer study of small cells in initial stages of the disease. |
Fig. 1Biclusters obtained by the biclustering algorithm for each fitness function for the GDS1116 yeast dataset
Fig. 2Biclusters obtained by the biclustering algorithm for each fitness function for the 15mM_diamide yeast dataset
Fig. 3Biclusters obtained by the biclustering algorithm for each fitness function for the 25mM_DTT yeast dataset
Biclusters obtained by the biclustering algorithm for each fitness functions for GDS3289, GDS2415, GDS2918, GDS3966, GDS3139,GDS4794 datasets
| Fitness function | Enriched | |||
|---|---|---|---|---|
| Dataset | Parameters | Size | biclusters (%) | |
| GOmeasure | ( | BP | ||
| (2, 1, 1) | (15.3 ×14.0) | 38 | ||
| simUI | (2, 1, 2) | (9.4 ×14.0) | 57 | |
| (2, 2, 1) | (20.0 ×3.0) | 39 | ||
| (2, 1, 1) | (17.1 ×13.5) | 28 | ||
| GDS3289 | simGIC |
| (10.6 ×14.0) |
|
| (2, 2, 1) | (28.4 ×3.0) | 18 | ||
| (2, 1, 0) | (21.6 ×14.3) | 1 | ||
| 0 | (2, 2, 0) | (40.6 ×3.4) | 5 | |
| (2, 1, 1) | (18.1 ×13.9) | 4 | ||
| simUI | (2, 1, 2) | (9.1 ×13.4) | 36 | |
| (2, 2, 1) | (21.9 ×3.1) | 8 | ||
| (2, 1, 1) | (22.9 ×3.1) | 0 | ||
| GDS2415 | simGIC |
| (13.4 ×12.5) |
|
| (2, 2, 1) | (35.3 ×3.0) | 8 | ||
| (2, 1, 0) | (23.2 ×13.4) | 0 | ||
| 0 | (2, 2, 0) | (41.7 ×3.2) | 0 | |
| (2, 1, 1) | (40.3 ×6.8) | 1 | ||
| simUI | (2, 1, 2) | (40.0 ×6.8) | 0 | |
| (2, 2, 1) | (53.3 ×3.4) | 2 | ||
| (2, 1, 1) | (31.0 ×6.7) | 10 | ||
| GDS2918 | simGIC |
| (13.3 ×7.3) |
|
| (2, 2, 1) | (34.4 ×3.4) | 23 | ||
| (2, 1, 0) | (36.5 ×7.2) | 1 | ||
| 0 | (2, 2, 0) | (54.0 ×3.3) | 1 | |
| (2, 1, 1) | (26.2 ×17.8) | 4 | ||
| simUI | (2, 1, 2) | (26.5 ×18.3) | 6 | |
| (2, 2, 1) | (41.8 ×3.4) | 1 | ||
| (2, 1, 1) | (23.2 ×17.8) | 11 | ||
| GDS3966 | simGIC |
| (13.9 ×19.1) |
|
| (2, 2, 1) | (35.7 ×3.3) | 6 | ||
| (2, 1, 0) | (26.5 ×18.3) | 3 | ||
| 0 | (2, 2, 0) | (42.3 ×3.4) | 0 | |
| (2, 1, 1) | (35.6 ×13.0) | 2 | ||
| simUI | (2, 1, 2) | (35.2 ×13.2) | 3 | |
| (2, 2, 1) | (28.9 ×9.7) | 1 | ||
| (2, 1, 1) | (31.6 ×13.4) | 7 | ||
| GDS3139 | simGIC |
| (18.4 ×13.1) |
|
| (2, 2, 1) | (26.9 ×9.9) | 4 | ||
| (2, 1, 0) | (35.1 ×13.4) | 2 | ||
| 0 | (2, 2, 0) | (27.7 ×9.8) | 3 | |
| (2, 1, 1) | (26.0 ×17.1) | 5 | ||
| simUI | (2, 1, 2) | (25.7 ×17.2) | 3 | |
| (2, 2, 1) | (46.3 ×3.8) | 0 | ||
| (2, 1, 1) | (23.7 ×16.7) | 13 | ||
| GDS4794 | simGIC |
| (17.0 ×16.43) |
|
| (2, 2, 1) | (36.6 ×3.7) | 10 | ||
| (2, 1, 0) | (25.7 ×17.1) | 4 | ||
| 0 | (2, 2, 0) | (46.8 ×3.8) | 1 |
Fig. 4Overlapping among biclusters obtained by (212)-simGIC fitness function configuration for GDS4794 dataset
Fig. 5Histogram of percentage of overlapping among biclusters obtained by (212)-simGIC fitness function configuration for GDS4794 dataset
Fig. 6Overlapping among biclusters obtained by (212)-simGIC fitness function configuration for GDS4794 dataset studying only their genes
Group of enriched biclusters related to cancer obtained with the (212)-simGIC fitness function for the GDS4794 dataset
| id. | Oncogenes | Candidate cancer genes | Number of |
|---|---|---|---|
| biclusters | genes in | ||
| each bicluster | |||
| BRIP1 | 6 | ||
| CRTC1, KLF6 | ERF | 8 | |
| PIK3R1 | 12 | ||
| FANCD2 | 11 | ||
| SMARCE1 | 13 | ||
| ATP2B3 | AMPH, ANK2 | 18 | |
| RPL22 | 12 | ||
| BLM, MSH2, REL, MYC | SMAD2 | 18 | |
| EZH2, TFE3, ACSL6 | 28 | ||
| ELF4 | GNA13 | 11 | |
| PALB2, TPR, NUP98, NUP214 | CHD4, DBR1 | 16 | |
| ZHX2 | 7 | ||
| CLCN4 | 9 | ||
| SNRPA, DBR1 | 13 | ||
| NR3C2, CHD2 | 18 | ||
| TTK, PHIP, GLI3 | 16 | ||
| GRM3 | 10 | ||
| PRKCG, RASGEF1A | 17 | ||
| CACNA2D1 | 16 | ||
| PTPRT, NGEF, GRIA3, CHST1, DUSP7 | 17 | ||
| AMPH, BRINP3, SPTBN4, RBMX | 22 | ||
| NCOR1, NCOR2, RBMX, TCEB1 | 19 | ||
| PPM1D, TDG, RNF103, CTIF | 17 | ||
| SLC25A48, TAF1, RASSF6 | 46 |
Fig. 7Biological study of biclusters obtained by the (212)-simGIC configuration for the GDS4794 dataset: highlighted GO term observed in the results for GDS4794 dataset
Mapping analysis provided by Reactome for the bi53 bicluster obtained with the (212)-simGIC fitness function
| Pathway identifier | Pathway name | FDR | Entities | Entities |
|---|---|---|---|---|
| found | total | |||
| R-HSA-3304347 | Loss of Function of SMAD4 in Cancer | 5.27E-11 | 2 | 3 |
| R-HSA-3311021 | SMAD4 MH2 Domain Mutants in Cancer | 5.27E-11 | 2 | 3 |
| R-HSA-3304356 | SMAD2/3 Phosphorylation Motif Mutants in Cancer | 0.002 | 2 | 7 |
| R-HSA-3304349 | Loss of Function of SMAD2/3 in Cancer | 0.002 | 2 | 9 |
| R-HSA-3315487 | SMAD2/3 MH2 Domain Mutants in Cancer | 0.002 | 2 | 9 |
| R-HSA-3656532 | TGFBR1 KD Mutants in Cancer | 0.002 | 2 | 9 |
| R-HSA-3656534 | Loss of Function of TGFBR1 in Cancer | 0.002 | 2 | 9 |
| R-HSA-3304351 | Signaling by TGF-beta Receptor Complex in Cancer | 0.002 | 2 | 10 |
| R-HSA-2894858 | Signaling by NOTCH1 HD+PEST Domain Mutants in Cancer | 0.017 | 2 | 68 |
| R-HSA-2644602 | Signaling by NOTCH1 PEST Domain Mutants in Cancer | 0.017 | 2 | 68 |
| R-HSA-2644603 | Signaling by NOTCH1 in Cancer | 0.017 | 2 | 68 |
Those pathways that includes the word cancer in their names are presented in this table. The complete information about the 133 found pathways can be downloaded as an excel file in the link of Availability of data and materials