| Literature DB >> 31888433 |
Chun-Mei Feng1,2, Yong Xu3,4, Mi-Xiao Hou1, Ling-Yun Dai5, Jun-Liang Shang6.
Abstract
BACKGROUND: In recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics. Principal Component Analysis (PCA) is a widely used method in gene expression data. However, it has two limitations: first, the geometric structure hidden in data, e.g., pair-wise distance between data points, have not been explored. This information can facilitate sample clustering; second, the Principal Components (PCs) determined by PCA are dense, leading to hard interpretation. However, only a few of genes are related to the cancer. It is of great significance for the early diagnosis and treatment of cancer to identify a handful of the differentially expressed genes and find new cancer biomarkers.Entities:
Keywords: Differentially expressed genes; Gene expression data; Graph Laplacian; Principal component analysis; Sparse constraint
Mesh:
Year: 2019 PMID: 31888433 PMCID: PMC6936054 DOI: 10.1186/s12859-019-3229-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of the two datasets
| Data sets | Number of | class distribution | ||
|---|---|---|---|---|
| Samples | Genes | Normal | disease | |
| PAAD | 180 | 20,502 | 4 | 176 |
| HNSC | 418 | 20,502 | 20 | 398 |
Fig. 1Overlap among the differentially expressed genes identified by the compared methods
Results on identification accuracy (IA) and total relevance score (TRS) of six methods on PAAD and HNSC dataset
| Methods | PAAD | HNSC | ||
|---|---|---|---|---|
| IA | TRS | IA | TRS | |
| Z-SPCA | 77.00 | 901.67 | 53.00 | 540.91 |
| GPower | 77.00 | 922.27 | 43.00 | 378.70 |
| PathSPCA | 61.00 | 682.56 | 579.06 | |
| SPCArt | 77.00 | 901.67 | 53.00 | 540.91 |
| gLPCA | 75.00 | 878.39 | 50.00 | 513.94 |
| gLSPCA | ||||
The function of differentially expressed genes on PAAD dataset identified by gLSPCA but not the other methods
| Gene name | Function | Relevance score |
|---|---|---|
| PPY | This gene encodes a member of the neuropeptide Y (NPY) family of peptides. | 21.13 |
| CD24 | This gene encodes a sialoglycoprotein that is expressed on mature granulocytes and B cells and modulates growth and differentiation signals to these cells. | 8.27 |
The function of differentially expressed genes on HNSC dataset identified by gLSPCA but not the other methods
| Gene name | Function | Relevance score |
|---|---|---|
| HSPA1A | This intronless gene encodes a 70 kDa heat shock protein which is a member of the heat shock protein 70 family. | 7.67 |
| COL6A1 | The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. | 4.05 |
Fig. 2The interacting proteins network of the identified differentially expressed genes
Graphical presentation of the interacting proteins network of the differentially expressed genes identified by gLSPCA but not the other compared methods.
ACC performance of all methods
| Datasets | All-Ge | Z-SPCA | GPower | PathSPCA | SPCArt | gLPCA | gLSPCA |
|---|---|---|---|---|---|---|---|
| PAAD | 83.09 | 95.00 | 95.00 | 95.00 | 96.35 | 95.00 | |
| HNSC | 78.23 | 75.84 | 77.51 | 72.73 | 75.84 | 79.43 |
Notes: “All-Ge” denotes all features cluster without any dimension reduction processing
Construct weight matrix Compute the diagonal matrix Compute Compute the optimal Compute diagonal matrix | |