| Literature DB >> 35185329 |
Abstract
Known genes in the breast cancer study literature could not be confirmed whether they are vital to breast cancer formations due to lack of convincing accuracy, although they may be biologically directly related to breast cancer based on present biological knowledge. It is hoped vital genes can be identified with the highest possible accuracy, for example, 100% accuracy and convincing causal patterns beyond what has been known in breast cancer. One hope is that finding gene-gene interaction signatures and functional effects may solve the puzzle. This research uses a recently developed competing linear factor analysis method in differentially expressed gene detection to advance the study of breast cancer formation. Surprisingly, 3 genes are detected to be differentially expressed in TNBC and non-TNBC (Her2, Luminal A, Luminal B) samples with 100% sensitivity and 100% specificity in 1 study of triple-negative breast cancers (TNBC, with 54 675 genes and 265 samples). These 3 genes show a clear signature pattern of how TNBC patients can be grouped. For another TNBC study (with 54 673 genes and 66 samples), 4 genes bring the same accuracy of 100% sensitivity and 100% specificity. Four genes are found to have the same accuracy of 100% sensitivity and 100% specificity in 1 breast cancer study (with 54 675 genes and 121 samples), and the same 4 genes bring an accuracy of 100% sensitivity and 96.5% specificity in the fourth breast cancer study (with 60 483 genes and 1217 samples). These results show the 4-gene-based classifiers are robust and accurate. The detected genes naturally classify patients into subtypes, for example, 7 subtypes. These findings demonstrate the clearest gene-gene interaction patterns and functional effects with the smallest numbers of genes and the highest accuracy compared with findings reported in the literature. The 4 genes are considered to be essential for breast cancer studies and practice. They can provide focused, targeted researches and precision medicine for each subtype of breast cancer. New breast cancer disease types may be detected using the classified subtypes, and hence new effective therapies can be developed.Entities:
Keywords: Direct and indirect effects; breast cancer detection; functional effects; gene-gene interaction; joint risk competing
Year: 2022 PMID: 35185329 PMCID: PMC8851495 DOI: 10.1177/11769351221076360
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Gene information, expression values, competing factors, risk probabilities.
| The first (TNBC) dataset | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ID | TNBC/No | 236872_at | 241480_at | 62987_r_at | CF1 | CF2 | CFmax | Pmax | ||
| GSM1974566 | 1 | 6.09 | 7.42 | 10.55 | 3.37 | 1.19 | 3.37 | 0.97 | ||
| GSM1974567 | 1 | 5.80 | 7.35 | 10.23 | 4.32 | 3.47 | 4.32 | 0.99 | ||
| . . . . . . | ||||||||||
| GSM1974763 | 1 | 5.82 | 7.25 | 10.20 | 4.15 | 2.52 | 4.15 | 0.98 | ||
| GSM1978883 | 0 | 6.15 | 6.81 | 11.50 | −2.01 | −4.26 | −2.01 | 0.12 | ||
| . . . . . . | ||||||||||
| GSM1978948 | 0 | 6.17 | 6.69 | 11.92 | −3.93 | −5.47 | −3.93 | 0.02 | ||
| GSM1978949 | 0 | 6.21 | 6.73 | 11.73 | −3.11 | −5.45 | −3.11 | 0.04 | ||
| The second (TNBC) dataset | ||||||||||
| ID | TNBC/No | 220471_s_at | 220987_s_at | 228880_at | 62987_r_at | CF1 | CF2 | CFmax | Pmax | |
| GSM1588970 | 1 | 2.37 | 5.26 | 5.47 | 8.08 | 1.58 | 0.13 | 1.58 | 0.83 | |
| GSM1588971 | 1 | 2.36 | 5.40 | 4.86 | 8.25 | 1.97 | 0.25 | 1.97 | 0.88 | |
| . . . . . . | ||||||||||
| GSM1589150 | 0 | 2.42 | 4.98 | 8.83 | 8.64 | −6.16 | −0.26 | −0.26 | 0.44 | |
| GSM1589151 | 0 | 2.46 | 5.02 | 8.95 | 8.57 | −6.09 | −0.47 | −0.47 | 0.38 | |
| The third breast cancer dataset | ||||||||||
| ID | BC/NoBC | 220471_s_at | 226899_at | 220987_s_at | 228880_at | CF1 | CF2 | CFmax | Pmax | |
| GSM1045191 | 0 | 3.21 | 6.95 | 2.94 | 5.65 | −10.01 | −0.49 | −0.49 | 0.38 | |
| GSM1045192 | 0 | 2.90 | 7.88 | 3.57 | 4.27 | −2.00 | −0.11 | −0.11 | 0.47 | |
| GSM1045193 | 0 | 3.16 | 6.57 | 3.19 | 4.67 | −6.67 | −1.49 | −1.49 | 0.18 | |
| . . . . . . | ||||||||||
| GSM1045207 | 0 | 2.31 | 6.68 | 3.41 | 7.61 | −3.40 | −4.63 | −3.40 | 0.03 | |
| GSM1045208 | 1 | 2.31 | 8.56 | 4.02 | 4.43 | 5.08 | −0.73 | 5.08 | 0.99 | |
| . . . . . . | ||||||||||
| GSM1045310 | 1 | 2.31 | 7.84 | 5.54 | 4.67 | 9.40 | −8.70 | 9.40 | 1.00 | |
| GSM1045311 | 1 | 2.31 | 9.91 | 4.09 | 4.85 | 4.43 | 1.35 | 4.43 | 0.99 | |
| The fourth breast cancer dataset | ||||||||||
| ID | BC/NoBC | MYCT1 | UNC5B | NUAK2 | NAT8L | CF1 | CF2 | CF3 | CFmax | Pmax |
| TCGA-E9-A1NI-01A | 1 | 1.82 | 3.40 | 2.24 | 1.08 | 5.09 | 2.82 | 8.38 | 8.38 | 1.00 |
| TCGA-A1-A0SP-01A | 1 | 1.48 | 2.74 | 1.27 | 3.16 | 1.05 | -0.01 | −0.82 | 1.05 | 0.74 |
| TCGA-BH-A1EU-11A | 0 | 3.34 | 2.01 | 1.12 | 2.21 | −2.79 | -2.39 | −4.23 | −2.39 | 0.08 |
| . . .. . . | ||||||||||
| TCGA-BH-A0BW-11A | 0 | 2.86 | 2.85 | 2.12 | 1.98 | 1.05 | 1.03 | 2.20 | 2.20 | 0.90 |
| . . .. . . | ||||||||||
| TCGA-BH-A0DK-11A | 0 | 2.08 | 1.77 | 0.56 | 0.94 | 0.68 | −3.57 | 0.69 | 0.69 | 0.67 |
| . . .. . . | ||||||||||
| TCGA-E2-A1LH-11A | 0 | 2.27 | 1.92 | 1.46 | 2.16 | 0.76 | −2.39 | 0.53 | 0.76 | 0.68 |
| . . .. . . | ||||||||||
| TCGA-AC-A2FF-11A | 0 | 2.91 | 3.18 | 1.18 | 1.33 | −0.46 | 1.31 | −0.19 | 1.31 | 0.79 |
| . . .. . . | ||||||||||
| TCGA-A7-A5ZW-01A | 1 | 2.42 | 3.96 | 1.52 | 0.42 | 2.69 | 4.01 | 5.31 | 5.31 | 1.00 |
| TCGA-BH-A203-01A | 1 | 1.74 | 2.87 | 1.44 | 0.79 | 3.77 | 0.54 | 5.94 | 5.94 | 1.00 |
Figure 1.Risk probabilities of 4 cohorts. The circles are for patients with breast cancers. The asters are for tissues without breast cancers.
Figure 2.Venn diagrams of breast cancer subtypes. The first 3 cohorts have more than 3 subtypes. The fourth cohort has more than 7 subtypes.
Figure 3.Four-dimensional plots for visualizing risk signature patterns from 3 competing component classifiers and the combined functional effects of gene-gene interactions and gene-subtype interactions of 4 genes.
Characteristics of the first dataset samples (TNBC, A North American Cohort).
| Sex | Age | BMI | Grade | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Male | Female | ⩽50 | (50,60] | (60,70] | >70 | Normal | Overweight | Obese | Poorly | Moderately | Well | |
| CF-1 | 0 | 63 | 34 | 12 | 10 | 5 | 13 | 19 | 61 | 33 | 25 | 1 |
| CF-2 | 0 | 2 | 1 | 0 | 1 | 0 | 0 | 1 | 2 | 1 | 0 | 0 |
| CF-(1,2) | 0 | 133 | 51 | 33 | 28 | 18 | 36 | 41 | 129 | 74 | 28 | 3 |
Characteristics of the second dataset samples (TNBC, A European Cohort).
| Age | BMI | ||||||
|---|---|---|---|---|---|---|---|
| ⩽50 | (50,60] | (60,70] | >70 | Normal | Overweight | Obese | |
| CF-1 | 3 | 4 | 2 | 2 | 5 | 3 | 3 |
| CF-2 | 2 | 2 | 0 | 1 | 2 | 2 | 1 |
| CF-(1,2) | 9 | 10 | 5 | 1 | 12 | 6 | 6 |
Characteristics of the third dataset samples (BC, A European Cohort).
| Age | Grade | ||||||
|---|---|---|---|---|---|---|---|
| ⩽50 | (50,60] | (60,70] | >70 | Poorly | Moderately | Well | |
| CF-1 | 9 | 15 | 11 | 9 | 25 | 14 | 5 |
| CF-2 | 3 | 4 | 2 | 0 | 4 | 5 | 0 |
| CF-(1,2) | 15 | 13 | 10 | 13 | 24 | 21 | 6 |
Characteristics of the fourth dataset samples (TCGA, Genomic Data Commons).
| Age | Sex | Stage | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ⩽50 | (50,60] | (60,70] | (70,80] | >80 | Male | Female | I | II | III | IV | X | |
| CF-1 | 8 | 1 | 3 | 0 | 1 | 0 | 13 | 4 | 6 | 3 | 0 | 0 |
| CF-2 | 10 | 9 | 7 | 2 | 1 | 1 | 30 | 6 | 16 | 9 | 0 | 0 |
| CF-3 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| CF-(1,2) | 5 | 1 | 4 | 1 | 1 | 0 | 12 | 2 | 7 | 2 | 0 | 1 |
| CF-(1,3) | 44 | 36 | 35 | 27 | 7 | 2 | 148 | 25 | 97 | 24 | 2 | 0 |
| CF-(2,3) | 2 | 5 | 5 | 1 | 0 | 0 | 13 | 3 | 3 | 5 | 1 | 1 |
| CF-(1,2,3) | 257 | 216 | 225 | 119 | 43 | 9 | 862 | 143 | 490 | 202 | 17 | 10 |
Correlation coefficients between CFmax from the fourth data and 8 genes in the literature.
| CFmax | BRCA1 | BRCA2 | PALB2 | BARD1 | RAD51C | RAD51D | ATM | CHEK2 | |
|---|---|---|---|---|---|---|---|---|---|
| CFmax | 1.00 | .25 | .25 | .24 | .31 | .12 | .13 | −.12 | .21 |
| BRCA1 | 1.00 | .48 | .52 | .50 | .26 | .50 | .14 | .28 | |
| BRCA2 | 1.00 | .47 | .61 | .15 | .24 | .28 | .45 | ||
| PALB2 | 1.00 | .53 | .19 | .32 | .30 | .20 | |||
| BARD1 | 1.00 | .17 | .27 | .21 | .41 | ||||
| RAD51C | 1.00 | .19 | −.15 | .17 | |||||
| RAD51D | 1.00 | .14 | .14 | ||||||
| ATM | 1.00 | −.15 | |||||||
| CHEK2 | 1.00 |