| Literature DB >> 36092474 |
Ronald D Hagan1, Michael A Langston1.
Abstract
Recent discoveries of distinct molecular subtypes have led to remarkable advances in treatment for a variety of diseases. While subtyping via unsupervised clustering has received a great deal of interest, most methods rely on basic statistical or machine learning methods. At the same time, techniques based on graph clustering, particularly clique-based strategies, have been successfully used to identify disease biomarkers and gene networks. A graph theoretical approach based on the paraclique algorithm is described that can easily be employed to identify putative disease subtypes and serve as an aid in outlier detection as well. The feasibility and potential effectiveness of this method is demonstrated on publicly available gene co-expression data derived from patient samples covering twelve different disease families.Entities:
Keywords: molecular subtyping; outlier detection; paraclique algorithm; transcriptomic data
Year: 2021 PMID: 36092474 PMCID: PMC9455766 DOI: 10.3390/a14020063
Source DB: PubMed Journal: Algorithms ISSN: 1999-4893
Figure 1.An illustration of the paraclique algorithm with glom term g = 2. (a) Starting with a maximum clique of size 4 as shown by red vertices, (b) paraclique first gloms onto vertex 5, (c) and then it glams onto vertex 8 to form a paraclique of size 6.
Subtyping datasets. A profile of the datasets used in this study.
| Disease | GEO Accession | Patients | Probes | ||
|---|---|---|---|---|---|
| Case | Control | Initial | Filtered | ||
| Asthma | GSE4302 | 42 | 28 | 54,675 | 2322 |
| Breast Cancer | GSE10810 | 31 | 27 | 18,382 | 11,531 |
| Chronic Lymphocytic Leukemia | GSE8835 | 24 | 12 | 22,283 | 1338 |
| Colorectal Cancer | GSE9348 | 70 | 12 | 54,675 | 22,968 |
| Lung Cancer | GSE7670 | 27 | 27 | 22,283 | 7458 |
| Multiple Sclerosis | GDS3920 | 14 | 15 | 54,674 | 9844 |
| Pancreatic Cancer | GDS4102 | 36 | 16 | 54,613 | 23,711 |
| Parkinson’s Disease | GSE20141 | 10 | 8 | 54,674 | 6625 |
| Prostate Cancer | GSE6919 | 61 | 63 | 12,625 | 1531 |
| Psoriasis | GSE13355 | 58 | 58 | 54,675 | 29,407 |
| Schizophrenia | GSE17612 | 28 | 23 | 54,675 | 4250 |
| Type 2 Diabetes | GSE20966 | 10 | 10 | 61,294 | 93 |
Subgroups identified. Summary of the numbers and sizes of putative subgroups identified by our methods in testing data.
| Disease | Subgroups Identified | Subgroup Sizes |
|---|---|---|
| Asthma | 3 | 31, 8, 3 |
| Breast Cancer | 2 | 22, 5 |
| Chronic Lymphocytic Leukemia | 2 | 4, 18 |
| Colorectal Cancer | 2 | 63, 5 |
| Lung Cancer | 2 | 21, 5 |
| Multiple Sclerosis | 2 | 11, 3 |
| Pancreatic Cancer | 2 | 31, 5 |
| Parkinson’s Disease | 1 | 8 |
| Prostate Cancer | 2 | 56, 3 |
| Psoriasis | 2 | 49, 5 |
| Schizophrenia | 2 | 19, 6 |
| Type 2 Diabetes | 1 | 9 |
GO enrichments. Listed is the GO term category with the lowest enrichment p-value of the 100 most differentially expressed genes for each disease in this study.
| Dataset | GO Category | |
|---|---|---|
| Asthma GSE4302 | Oxireductase | 1.1 × 10−4 |
| Breast Cancer GSE10810 | Secreted | 1.0 × 10−13 |
| Chronic Lymphocytic Leukemia GSE8835 | Mhc ii | 2.4 × 10−15 |
| Colorectal Cancer GSE9348 | Translational elongation | 2.8 × 10−28 |
| Lung Cancer GSE7670 | Secreted | 7.7 × 10−10 |
| Multiple Sclerosis GDS3920 | Translational elongation | 1.9 × 10−34 |
| Pancreatic Cancer GDS4102 | Signal | 4.59 × 10−15 |
| Prostate Cancer GSE6919 | Translational elongation | 4.92 × 10−46 |
| Psoriasis GSE13355 | Immune response | 3.5 × 10−15 |
| Schizophrenia GSE17612 | Organelle membrane | 5.24 × 10−4 |
Cluster compositions based on known subtypes. Shown is a breakdown of the subtypes obtained from datasets with best available ground truth for paraclique, k-means, and hierarchical clustering.
| Paraclique Results | ||||||
|---|---|---|---|---|---|---|
| Gastric Cancer | NSCLC | |||||
| Paraclique Sizes | Paraclique Sizes | |||||
| 29 | 16 | 26 | 12 | 8 | ||
| Subtype | Subtype | |||||
| proliferative | 1 | 12 | AC | 23 | 0 | 8 |
| invasive | 19 | 1 | SCC | 3 | 12 | 0 |
| metabolic | 9 | 3 | ||||
Figure 2.Outlier detection using paracliques. (a) A normalized threshold of 1.0 usually produces an empty graph. (b) As the threshold is lowered, more edges are added and paracliques begin to form and merge. (c) If a vertex consistently joins no paraclique, then it is flagged as a potential outlier.