| Literature DB >> 36017150 |
Chaohui Xiao1, Fuchuan Wang2, Tianye Jia3, Liru Pan1, Zhaohai Wang1.
Abstract
In big data analysis with the rapid improvement of computer storage capacity and the rapid development of complex algorithms, the exponential growth of massive data has also made science and technology progress with each passing day. Based on omics data such as mRNA data, microRNA data, or DNA methylation data, this study uses traditional clustering methods such as kmeans, K-nearest neighbors, hierarchical clustering, affinity propagation, and nonnegative matrix decomposition to classify samples into categories, obtained: (1) The assumption that the attributes are independent of each other reduces the classification effect of the algorithm to a certain extent. According to the idea of multilevel grid, there is a one-to-one mapping from high-dimensional space to one-dimensional. The complexity is greatly simplified by encoding the one-dimensional grid of the hierarchical grid. The logic of the algorithm is relatively simple, and it also has a very stable classification efficiency. (2) Convert the two-dimensional representation of the data into the one-dimensional representation of the binary, realize the dimensionality reduction processing of the data, and improve the organization and storage efficiency of the data. The grid coding expresses the spatial position of the data, maintains the original organization method of the data, and does not make the abstract expression of the data object. (3) The data processing of nondiscrete and missing values provides a new opportunity for the identification of protein targets of small molecule therapy and obtains a better classification effect. (4) The comparison of the three models shows that Naive Bayes is the optimal model. Each iteration is composed of alternately expected steps and maximal steps and then identified and quantified by MS.Entities:
Mesh:
Year: 2022 PMID: 36017150 PMCID: PMC9398858 DOI: 10.1155/2022/4004130
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.809
Figure 1Conceptual model.
Figure 2Screening process.
Differences in the expression of hepatocellular carcinoma-related genes.
| Gene | GeneBank | LogFC | FDR | Gene | GeneBank | LogFC | FDR |
|---|---|---|---|---|---|---|---|
| CXCL11 | NM_005409.5 | 1 | 2.51 | IDO1 | NM_002164.6 | 4 | 2.44 |
| KLK3 | NM_001648.2 | 10 | 2.07 | SLAM | NM_020125.3 | 2 | 2.01 |
| WARS | NM_004184.4 | 2 | 2.8 | OLFM | NM_006418.5 | 8 | 2.48 |
| GBP4 | NM_052941.5 | 8 | 2.77 | APOC1 | NM_001645.5 | 10 | 2.02 |
| CD300LF | NM_139018.5 | 2 | 2.29 | KRT81 | NM_002281.4 | 8 | 2.22 |
| ZNF683 | NM_001114759.3 | 9 | 2.61 | FCGR1B | NR_164759.1 | 5 | 2.06 |
| CXCL10 | NM_001565.4 | 8 | 2.35 | OR2B2 | NM_033057.2 | 7 | 2.8 |
| LAM | NM_014398.4 | 3 | 2.98 | CETN1 | NM_004066.3 | 10 | 2.78 |
| GBP5 | NM_052942.5 | 1 | 2.9 | RUNX1 | NM_001754.5 | 5 | 2.65 |
| ELANE | NM_001972.4 | 3 | 2.69 | DNASE2B | NM_021233.3 | 4 | 2.29 |
| EPSTI1 | NM_001002264.4 | 8 | 2.35 | IFI30 | NM_006332.5 | 3 | 2.1 |
| HAPLN3 | NM_001307952.2 | 9 | 2.6 | DM | NM_021951.3 | 1 | 2.02 |
Figure 3Differences in the expression of hepatocellular carcinoma-related genes.
Figure 4Expression data.
Test point properties.
| EARFCN_1 | PCI_1 | RSRP_1 | EARFCN_2 | PCI_2 | RSRP_2 | |
|---|---|---|---|---|---|---|
| Collect_Time | 2.32 | 12.13 | 12 | 5.15 | 13 | 9 |
| IMEI | 2.45 | 3.37 | 15 | 4.24 | 3 | 4 |
| LAT | 2.79 | 18.84 | 11 | 3.23 | 19 | 5 |
| LNG | 2.52 | 10.18 | 14 | 2.32 | 20 | 4 |
| ECI | 2.55 | 11.95 | 15 | 2.27 | 20 | 19 |
| EARFCN | 2.52 | 13.83 | 17 | 8.64 | 13 | 10 |
| PCI | 2.31 | 7.31 | 17 | 5.35 | 2 | 3 |
| RSRP | 2.63 | 6.85 | 14 | 9.77 | 19 | 17 |
Figure 5Test point properties.
Classification algorithm training.
| CRS | EPSG | Extent | Level | |
|---|---|---|---|---|
| M2SMF | 3 | 8 | 15 | 2.43 |
| SNF | 1 | 5 | 14 | 7.86 |
| PAM50 | 4 | 5 | 15 | 2.15 |
| iCluster | 4 | 10 | 16 | 4.71 |
| kmeans | 4 | 5 | 13 | 5.44 |
| pins | 2 | 6 | 17 | 2.59 |
| MCCA | 2 | 5 | 12 | 7.63 |
Figure 6Classification algorithm training.
Algorithm optimization.
| Naive Bayes | Okumura-Hata | AdaBoost | |
|---|---|---|---|
| LAML | 0.92 | 0.03 | 0.38 |
| KIRC | 0.96 | 0.37 | 0.07 |
| LIHC | 0.67 | 0.5 | 0.55 |
| M2SMF | 0.71 | 0.32 | 0.54 |
| SNF | 0.52 | 0.49 | 0.27 |
| MCCA | 0.67 | 0.37 | 0.51 |
| PINS | 0.68 | 0.8 | 0.36 |
Figure 7Algorithm optimization.