| Literature DB >> 35247965 |
Quang-Huy Nguyen1, Tin Nguyen2, Duc-Hau Le3.
Abstract
BACKGROUND: To date, cancer still is one of the leading causes of death worldwide, in which the cumulative of genes carrying mutations was said to be held accountable for the establishment and development of this disease mainly. From that, identification and analysis of driver genes were vital. Our previous study indicated disagreement on a unifying pipeline for these tasks and then introduced a complete one. However, this pipeline gradually manifested its weaknesses as being unfamiliar to non-technical users, time-consuming, and inconvenient.Entities:
Keywords: Clinical feature; Driver gene; Genetic biomaker; Human breast cancer; Mouse metabolic syndrome; Omics data
Mesh:
Substances:
Year: 2022 PMID: 35247965 PMCID: PMC8897886 DOI: 10.1186/s12859-022-04606-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Framework of DrGA. DrGA, armed with the four widely used analyses, dealt automatically with identified driver genes, and then provided the users with analysis results moved directly to predefined R working directory or printed out in R console results
Four agglomeration methods considered automatically in DrGA to specify the appropriate one
| Cluster distance measure | Description | Formula |
|---|---|---|
| Single method | The distance between two clusters, c1 and c2, is defined as the shortest distance between two points, x1 and x2 in each cluster | |
| Complete method | The distance between two clusters, c1 and c2, is defined as the longest distance between two points, x1 and x2 in each cluster | |
| Average method | The distance between two clusters, c1 and c2, is defined as the average distance between each point in one cluster to every point in the other cluster | |
| Ward’s method | Minimizes the total within-cluster error sum of squares, and then, at each stage, iteratively identifies pairs of groups with minimum between-group distance and carry out the merger of those two |
D(X,Y) the distance between X and Y, c and c cluster 1 and cluster 2, x and x a point in cluster 1 and a point in cluster 2, TDtotal distance, mean
Fig. 2Illustration of four agglomeration methods included in DrGA. The number of subgroups was two for example purpose. a single-linkage method. b complete linkage method. c average-linkage method. d Ward’s minimum variance method. c and c, cluster 1 and cluster 2
Fig. 4Identification of the optimal number of subgroups. a Connectivity index selected two subgroups. b Dunn index selected two subgroups. c Silhouette index selected two subgroups. d Differences in expression events between the identified groups. Two distinct groups were found (pink and orange)
Fig. 3Analysis results performed in module 3 of DrGA. a DrGA discovered 12 co-expressed modules with corresponding numbers of genes as well as top-five hub genes included in each module. b Associations between each module and the eight selected clinical features. weight_g, bodyweight of mice (gram unit), length_cm body length of mice (centimeters unit), ab_fat abdominal fat, total_fat total fat, UC ulcerative colitis, FFA free fatty acids, Glucose glycemic index, LDL_Plus_VLDL two LDL and VLDL cholesterol levels
Comparison between the involved subgroups in terms of the chosen clinical features
| 1 (N = 125) | 2 (N = 7) | ||
|---|---|---|---|
| weight_g | 38.2 (6.21) | 36.5 (2.24) | 0.110 |
| length_cm | 10.2 (0.34) | 10.2 (0.36) | 1.000 |
| ab_fat | 2.53 [1.74;3.20] | 2.04 [1.86;2.27] | 0.268 |
| total_fat | 4.91 [3.97;5.86] | 3.96 [3.55;4.19] | 0.059 |
| UC | 460 (122) | 417 (122) | 0.401 |
| FFA | 109 (29.0) | 86.0 (28.7) | 0.079 |
| Glucose | 432 (97.4) | 375 (71.9) | 0.086 |
| LDL_plus_VLDL | 1196 (315) | 1103 (246) | 0.371 |
For the first two continuous variables: weight_g and length_cm, and the last four continuous variables: UC, FFA, Glucose, LDL_plus_VLDL, median [percentiles 25%; percentiles 75%] were calculated at the first two columns. For the remaining two ordinal variables: ab_fat and total_fat, the number of cases and the percentage of cases in each tumor stage are shown
weight_g, bodyweight of mice (gram unit); length_cm, body length of mice (centimeters unit); ab_fat, abdominal fat; total_fat, total fat; UC, ulcerative colitis; FFA, free fatty acids; Glucose, glycemic index; and LDL_Plus_VLDL, two LDL and VLDL cholesterol levels