| Literature DB >> 22259227 |
Abstract
Early detection (localized stage) of colon cancer is associated with a five-year survival rate of 91%. Only 39% of colon cancers, however, are diagnosed at that early stage. Early and accurate diagnosis, therefore, constitutes a critical need and a decisive factor in the clinical treatment of colon cancer and its success. In this study, using supervised linear discriminant analysis, we have developed three diagnostic biomarker models that-based on global micro-RNA expression analysis of colonic tissue collected during surgery-can discriminate with a perfect accuracy between subjects with colon cancer (stages II-IV) and normal healthy subjects. We developed our three diagnostic biomarker models with 57 subjects [40 with colon cancer (stages II-IV) and 17 normal], and we validated them with 39 unknown (new and different) subjects [28 with colon cancer (stages II-IV) and 11 normal]. For all three diagnostic models, both the overall sensitivity and specificity were 100%. The nine most significant micro-RNAs identified, which comprise the input variables to the three linear discriminant functions, are associated with genes that regulate oncogenesis, and they play a paramount role in the development of colon cancer, as evidenced in the tumor tissue itself. This could have a significant impact in the fight against this disease, in that it may lead to the development of an early serum or blood diagnostic test based on the detection of those nine key micro-RNAs.Entities:
Keywords: ROC-supervised linear discriminant analysis; biomarkers; colon cancer; diagnostic biomarker models; global micro-RNA expression analysis; systems biology
Year: 2011 PMID: 22259227 PMCID: PMC3256938 DOI: 10.4137/CIN.S8779
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
The 12 miRNAs (constituent variables) of the three diagnostic biomarker models (D1, D2, and D3), ranked according to their ROC AUC value.
| Rank | ROC AUC | miRNA symbol | miRNA Signif. Diff. Expr. (CCA) | Known gene interactions | Observed processes | Known drugs/Chemicals/Hormones |
|---|---|---|---|---|---|---|
| 1 | 0.99813 | ↑ | colon cancer | |||
| 2 | 0.99440 | ↑ | RASA1, RAC1, TP53I13, EGR3, BCL10 | endometrial ovarian cancer, endometrioid carcinoma, lung cancer, lung squamous cell carcinoma, metastasis, colon cancer | 5-fluorouracil, vorinostat, trichostatin A, 25-hydroxy-vitamin D3, decitabine | |
| 3 | 0.99360 | ↓ | AQP4, CDK6, CYR61, FMR1, SLC7A6, THBS1, TMEM2, TUBA1A, VEZT, WDR82 | head and neck cancer, hypopharyngeal squamous cell carcinoma, uterine cancer, cervical carcinoma | vorinostat, docetaxel, Insulin | |
| 4 | 0.99307 | ↓ | BCL7B, TP53RK, VEGFA, PNP, FAM123C, SLC25A34, C17orf55, ONECUT2, PDLIM2, SLC8A3, HPCAL4, AHCYL1, BCOR, MAPRE3, RGS17, DGKG, SLC10A7 | hepatocellular carcinoma, liver cancer | 5-fluorouracil | |
| 5 | 0.99200 | ↑ | BCL10, BCL11B, BCLAF1, RBM8A, FOXO1, SRSF2, PDCD4, TRIM2, NUFIP2, GREM2, VAMP7, HECTD2, KY, PPP2R2A, CTDSPL, PLA2G2D, CDH9, THEM4, C16orf54, TUB, TUBA8, TUBGCP4 | liver cancer, endometrial ovarian cancer, endometrioid carcinoma, head and neck cancer, hypopharyngeal squamous cell carcinoma, hepatocellular carcinoma, melanoma metastases, melanoma, lung squamous cell carcinoma, metastasis, lung cancer | vorinostat | |
| 6 | 0.99147 | ↓ | RASAL1, METTL4, RASAL1, RBBP9, RBM12, TUSC2, SUFU, SOST, TBX3, NR2C2, FAM60A, SH3GLB1, SDCCAG3, BCL2L2, TRAF5, SHANK3, CSNK1G2, SLC7A6OS, WRB, BEND4 | head and neck cancer, hypopharyngeal squamous cell carcinoma | 4-hydroxynonenal, 25-hydroxy-vitamin D3 | |
| 7 | 0.98694 | HS-94 | ↑ | |||
| 8 | 0.98507 | miR-135b | ↑ | RASAL2, RASSF2, TP53TG3/TP53TG3B, BCL11A, BCL11B, BCL2L2, BCL9L, SMAD5, APC, JAK2, ALOX5AP, SUCLG2, ZNF28, OSCP1, PDCD6IP, C1QBP, GULP1, KCTD1, YBX2, OTUD6B, BHLHB9, DEPTOR, NR3C2, RUNX2 | liver cancer, hepatocellular carcinoma, melanoma metastases, melanoma, clear-cell adenocarcinoma, renal cancer, uterine cancer, cervical carcinoma | perchlorate, methimazole, tretinoin |
| 9 | 0.98028 | ↑ | RASA4/RASA4B, RASD1, RASEF, RASGRP1, TP53INP1, METAP2, METTL10, BCL2L11, BEND2, BEND6, API5, ZC3HC1, OSBPL11, NPY1R, ATG2B, BNIP3L, GPR137C, ZNF585A, TSPYL5, USP6NL, PLCD3, METAP2, C3orf64, ZNF140, ANKS1A, PAK4, MMP9 | papillary thyroid cancer, papillary thyroid carcinoma, endometrial cancer, pancreatic cancer, pancreatic ductal adenocarcinoma, lung squamous cell carcinoma, lung cancer | docetaxel, lipopolysaccharide, fluorouracil | |
| 10 | 0.97735 | ↓ | RASA1, RASAL2, RASD1, RASGEF1A, RASGRP3, RASL12, RASSF4, TP53, TP53INP1, TP63, BCL10, BCL11A, BCL11B, BCL2, BCL2L11, BCL2L15, BCL6, BCL9, BCLAF1, EED, ACTBL2, ACTC1, ACTN1, NEFL, NEFM, GNAI2, NEUROD1, TMED2, TMED10, CHD1, CBFB, RAD23B, AP2A1, SLC7A11, SLC4A7, MBNL1, TNRC6A, NUFIP2, P4HA2, NT5E, BDNF, RUNX2 | uterine cancer, liver cancer, uterine leiomyoma, papillary thyroid cancer, papillary thyroid carcinoma, head and neck cancer, hypopharyngeal squamous cell carcinoma, prostate cancer, cervical carcinoma, lung cancer, brain cancer, medulloblastoma, colorectal cancer, hepatocellular carcinoma, early-onset breast cancer, breast cancer, hormonedependent breast cancer, breast carcinoma | acetaminophen, 5-fluorouracil, docetaxel, oxaliplatin, 25-hydroxy-vitamin D3, Gulo, Hcg (chorionic gonadotropin complex), trichostatin A, ethanol, androgen, valproic acid | |
| 11 | 0.97601 | ↓ | RASGRP3, RASIP1, TP53TG3/TP53TG3B, TP63, BCL11A, BCL11B, BCL2L11, BCL2L13, MAF, MAFK, MED1, MED11, MED14, MED27, MEF2A, MEGF11, MEGF9, METAP1, METTL8, METTL9, ACTBL2, ACTC1, ACTN2, BCL11A, BCL11B, BCL2L11, BCL2L13, TNFAIP1, TNFAIP6, TNFAIP8, TNFSF10, TUBB1, EGR2, RB1, CDK2, CDK6, MET, MITF, CDK6, E2F6, NCOA2, SNAPC1, PLA2G15, STC2, DNAJB12, SSR1, RELL1, SLC6A17, C7orf28B/CCZ1, ASH1L, TMEM229B, C18orf1 | hepatocellular carcinoma, liver cancer | decitabine, trichostatin A, phorbol myristate acetate | |
| 12 | 0.97015 | miR-493–5p | ↑ |
Note: Their significant differential expression [over-expression (↑) or under-expression (↓)] as observed in the CCA group relative to the NRM group is shown, along with their symbol, known gene interactions, the processes wherein they have been observed to be involved, and known drug/chemical/hormone interactions.
Figure 1Scatter plot and bar graph of all 57 original subjects (40 CCA and 17 NRM) used in the Discovery Study in connection with the D1 and D2 diagnostic biomarker models.
Notes: As can be seen, 40/40 CCA subjects (purple color) had D1 and D2 scores lower than the determined cut-off scores of 21.800 and 21.235, respectively; therefore, 40/40 CCA subjects were identified correctly by both D1 and D2 diagnostic biomarker models [sensitivity = 40/40 = 1.000 for both D1 and D2]. Regarding the NRM group (green color), all 17 subjects had D1 and D2 scores greater than the determined cut-off scores of 21.800 and 21.235, respectively; therefore, 17/17 NRM subjects were identified correctly by both D1 and D2 diagnostic biomarker models [specificity = 17/17 = 1.000 for both D1 and D2]. For the Discovery Study, the mean D1 and D2 scores of the 40 CCA subjects were18.4054 and 18.3266 respectively (top of the D1 and D2 purple bars) and their respective standard deviations (whiskers above or below the top of the D1 and D2 purple bars) were 1.0899 and 1.0703. The mean D1 and D2 scores of the 17 NRM subjects were 23.7523 and 23.9373 respectively (top of the D1 and D2 green bars) and their respective standard deviations (whiskers above or below the top of the D1 and D2 green bars) were 0.7363 and 0.8029. The significance level was set at α = 0.001 (two-tailed), and the probability of significance for the D1 was P = 3.05 × 10−25 (independent t-Test with T-value = 18.4664), whereas the probability of significance for the D2 was P = 3.01 × 10−26 (independent t-Test with T-value = 19.3834). Both the D1 and the D2 are parametrically distributed with respect to both groups.
Statistical results of the three diagnostic biomarker models (D1, D2, and D3) in the Discovery Study (identification of the 57 original subjects) and in the Validation Study (identification of the 39 unknown subjects, which were new and different from the 57 original subjects).
| Diagnostic Test | ROC AUC | T-Value | P | CCA Group | NRM Group |
|---|---|---|---|---|---|
|
|
|
| |||
| (2-tailed) | [99.99% CI of mean] | [99.99% CI of mean] | |||
|
|
|
| |||
| α = 0.001 | (SD) | (SD) | |||
| D1 | 1.000 | 18.4664 | 3.05 × 10−25 | [17.8457, 18.9607] (1.0899) | [23.2097, 24.3414] (0.7363) |
| D2 | 1.000 | 19.3834 | 3.01 × 10−26 | [17.8040, 18.9029] (1.0703) | [23.3861, 24.5940] (0.8029) |
| D3 | 1.000 | 23.1476 | 4.96 × 10−30 | [17.4960, 18.4864] (0.9684) | [23.7995, 25.4473] (1.0730) |
|
|
| ||||
|
| |||||
| D1 | 1.000 | 10.8991 | 4.17 × 10−13 | 18.5568 ± 1.4817 | 23.7912 ± 0.9013 |
| D2 | 1.000 | 12.4374 | 8.76 × 10−15 | 18.5869 ± 1.1167 | 23.4817 ± 1.0766 |
| D3 | 1.000 | 12.9987 | 2.30 × 10−15 | 18.1475 ± 1.2818 | 24.5298 ± 1.6149 |
Notes: (A) The ROC AUC value, the T value and probability of significance (P) of the independent t-Test, the 99.99% confidence interval for the mean score of the CCA group and that of the NRM group, along with their respective standard deviations, of the D1, D2, and D3 diagnostic biomarker models in the Discovery Study are shown. (B) The ROC AUC value, the T value and probability of significance (P) of the independent t-Test, and the mean score of the CCA group and that of the NRM group, along with their respective standard deviations, of the D1, D2, and D3 diagnostic biomarker models in the Validation Study are shown. As can be seen, all six of those group mean scores, as observed in the validation study with the 39 unknown subjects, fall within the 99.99% confidence intervals of the respective group mean scores as predicted in the discovery study (A).
Figure 2Scatter plot and bar graph of all 57 original subjects (40 CCA and 17 NRM) used in the Discovery Study in connection with the D3 diagnostic biomarker model.
Notes: As can be seen, 40/40 CCA subjects (purple color) had D3 scores lower than the determined cut-off score of 21.382; therefore, 40/40 CCA subjects were identified correctly by the D3 diagnostic biomarker model [sensitivity = 40/40 = 1.000]. Regarding the NRM group (green color), all 17 subjects had D3 scores greater than the determined cut-off score of 21.382; therefore, 17/17 NRM subjects were identified correctly by the D3 diagnostic biomarker model [specificity = 17/17 = 1.000]. For the Discovery Study, the mean D3 score of the 40 CCA subjects was 18.0010 (top of the purple bar) and the standard deviation (whiskers above or below the top of the purple bar) was 0.9684. The mean D3 score of the 17 NRM subjects was 24.7016 (top of the green bar) and the standard deviation (whiskers above or below the top of the green bar) was 1.0730. The significance level was set at α = 0.001 (two-tailed), and the probability of significance for the D3 was P = 4.96 × 10−30 (independent t-Test with T-value = 23.1476). The D3 is parametrically distributed with respect to both groups.
Figure 33D Scatter plot of all 57 original subjects [40 CCA (purple) and 17 NRM (green)] used in the Discovery Study in connection with the D1, D2, and D3 diagnostic biomarker models.
Notes: The D1, D2, and D3 scores of all 57 original subjects are plotted against each other (D1 vs. D2 vs. D3). As can be seen, there are two distinct, separate clusters: the purple one (CCA group) is at the front and at a lower level, whereas the green one (NRM group) is at the back and at a higher level. It can also be seen that there were no misclassifications.
Figure 4Scatter plot and bar graph of all 39 unknown (new and different) subjects (28 CCA and 11 NRM) used in the Validation Study in connection with the D1 and D2 diagnostic biomarker models.
Notes: As can be seen, 28/28 unknown CCA subjects (purple color) had D1 and D2 scores lower than the determined cut-off scores of 21.800 and 21.235, respectively; therefore, 28/28 unknown CCA subjects were identified correctly by both D1 and D2 diagnostic biomarker models [sensitivity = 28/28 = 1.000 for both D1 and D2]. Regarding the NRM group (green color), all 11 unknown subjects had D1 and D2 scores greater than the determined cut-off scores of 21.800 and 21.235, respectively; therefore, 11/11 unknown NRM subjects were identified correctly by both D1 and D2 diagnostic biomarker models [specificity = 11/11 = 1.000 for both D1 and D2]. For the Validation Study, the mean D1 and D2 scores of the 28 unknown CCA subjects were 18.5568 and 18.5869 respectively (top of the D1 and D2 purple bars) and their respective standard deviations (whiskers above or below the top of the D1 and D2 purple bars) were 1.4817 and 1.1167. The mean D1 and D2 scores of the 11 unknown NRM subjects were 23.7912 and 23.4817 respectively (top of the D1 and D2 green bars) and their respective standard deviations (whiskers above or below the top of the D1 and D2 green bars) were 0.9013 and 1.0766. The significance level was set at α = 0.001 (two-tailed), and the probability of significance for the D1 was P = 4.17 × 10−13 (independent t-Test with T-value = 10.8991), whereas the probability of significance for the D2 was P = 8.76 × 10−15 (independent t-Test with T-value = 12.4374). Both the D1 and the D2 are parametrically distributed with respect to both groups.
Figure 5Scatter plot and bar graph of all 39 unknown (new and different) subjects (28 CCA and 11 NRM) used in the Validation Study in connection with the D3 diagnostic biomarker model.
Notes: As can be seen, 28/28 unknown CCA subjects (purple color) had D3 scores lower than the determined cut-off score of 21.382; therefore, 28/28 unknown CCA subjects were identified correctly by the D3 diagnostic biomarker model [ sensitivity = 28/28 = 1.000]. Regarding the NRM group (green color), all 11 unknown subjects had D3 scores greater than the determined cut-off score of 21.382; therefore, 11/11 unknown NRM subjects were identified correctly by the D3 diagnostic biomarker model [ specificity = 11/11 = 1.000]. For the Validation Study, the mean D3 score of the 28 unknown CCA subjects was 18.1475 (top of the purple bar) and the standard deviation (whiskers above or below the top of the purple bar) was 1.2818. The mean D3 score of the 11 unknown NRM subjects was 24.5298 (top of the green bar) and the standard deviation (whiskers above or below the top of the green bar) was 1.6149. The significance level was set at α = 0.001 (two-tailed), and the probability of significance for the D3 was P = 2.30 × 10−15 (independent t-Test with T-value = 12.9987). The D3 is parametrically distributed with respect to both groups.
Figure 63D Scatter plot of all 39 unknown (new and different) subjects [28 CCA (purple) and 11 NRM (green)] used in the Validation Study in connection with the D1, D2, and D3 diagnostic biomarker models.
Notes: The D1, D2, and D3 scores of all 39 unknown subjects are plotted against each other (D1 vs. D2 vs. D3). As can be seen, there are two distinct, separate clusters: the purple one (CCA group) is at the front and at a lower level, whereas the green one (NRM group) is at the back and at a higher level. It can also be seen that there were no misclassifications.
Canonical linear discriminant functions of D1, D2, and D3 diagnostic biomarker models developed from the original 57 subjects [17 NRM (Group 0) and 40 CCA (Group 1)].
| Discriminant Analysis Report | |||
|---|---|---|---|
| Group | 0 | 1 | Overall |
| Count | 17 | 40 | 57 |
Notes: The constituent miRNA variables, their respective coefficients, and the constant of each of the three canonical linear discriminant functions (D1, D2, and D3) are shown. The letter ‘T’ preceding the name of a miRNA indicates that that miRNA variable was transformed in order to meet normality, equality of variance, and/or equality of covariance requirements.
Test results for equality of covariance and variance among the constituent miRNA variables of the D1, D2, and D3 functions developed from the original 57 subjects [17 NRM (Group 0) and 40 CCA (Group 1)].
| Equality of Covariance and Variance Report | |||
|---|---|---|---|
| Group | 0 | 1 | Overall |
| Count | 17 | 40 | 57 |
Notes: As can be seen from the probability of significance values of both the F and the χ2 tests for the Box’s M test, there are no statistically significant covariance differences among the constituent miRNA variables of the D1, D2, or D3 function. Likewise, the Bartlett test shows that there are no statistically significant variance differences among the constituent miRNA variables of the D1, D2, or D3 function.
Normality test results for the D1, D2, and D3 linear discriminant functions with respect to both groups of the original 57 subjects [17 NRM (Group 0) and 40 CCA (Group 1)] used for the development of the three functions.
| Normality Tests Report | |||||
|---|---|---|---|---|---|
|
| |||||
| Test name | Test value | Prob level | 10% Critical value | 5% Critical value | Decision (5%) |
| Shapiro-Wilk W | 0.9679477 | 0.7815679 | Can’t reject normality | ||
| Anderson-Darling | 0.3600979 | 0.4483844 | Can’t reject normality | ||
| Martinez-lglewicz | 1.135777 | 1.252524 | 1.438767 | Can’t reject normality | |
| Kolmogorov-Smirnov | 0.1029178 | 0.19 | 0.207 | Can’t reject normality | |
| D’Agostino Skewness | −0.6319371 | 0.527428 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Kurtosis | 0.9578 | 0.338181 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Omnibus | 1.3167 | 0.517716 | 4.605 | 5.991 | Can’t reject normality |
| Shapiro-Wilk W | 0.9523966 | 9.170641E-02 | Can’t reject normality | ||
| Anderson-Darling | 0.4800356 | 0.233547 | Can’t reject normality | ||
| Martinez-lglewicz | 0.9609824 | 1.114676 | 1.175041 | Can’t reject normality | |
| Kolmogorov-Smirnov | 0.0905983 | 0.126 | 0.139 | Can’t reject normality | |
| D’Agostino Skewness | −0.3980126 | 0.6906209 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Kurtosis | −2.4009 | 0.016356 | 1.645 | 1.960 | Reject normality |
| D’Agostino Omnibus | 5.9226 | 0.051752 | 4.605 | 5.991 | Can’t reject normality |
| Shapiro-Wilk W | 0.9018213 | 7.286435E-02 | Can’t reject normality | ||
| Anderson-Darling | 0.6532255 | 8.824592E-02 | Can’t reject normality | ||
| Martinez-lglewicz | 1.067013 | 1.252524 | 1.438767 | Can’t reject normality | |
| Kolmogorov-Smirnov | 0.1256069 | 0.19 | 0.207 | Can’t reject normality | |
| D’Agostino Skewness | −1.408385 | 0.159017 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Kurtosis | −0.4372 | 0.661989 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Omnibus | 2.1747 | 0.337114 | 4.605 | 5.991 | Can’t reject normality |
| Shapiro-Wilk W | 0.9654804 | 0.2565536 | Can’t reject normality | ||
| Anderson-Darling | 0.4056016 | 0.3517282 | Can’t reject normality | ||
| Martinez-lglewicz | 1.038377 | 1.114676 | 1.175041 | Can’t reject normality | |
| Kolmogorov-Smirnov | 7.907125E-02 | 0.126 | 0.139 | Can’t reject normality | |
| D’Agostino Skewness | −1.585528 | 0.1128464 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Kurtosis | 0.4021 | 0.687630 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Omnibus | 2.6756 | 0.262427 | 4.605 | 5.991 | Can’t reject normality |
| Shapiro-Wilk W | 0.9496766 | 0.4514251 | Can’t reject normality | ||
| Anderson-Darling | 0.3490809 | 0.4751235 | Can’t reject normality | ||
| Martinez-lglewicz | 1.136325 | 1.252524 | 1.438767 | Can’t reject normality | |
| Kolmogorov-Smirnov | 0.1442362 | 0.19 | 0.207 | Can’t reject normality | |
| D’Agostino Skewness | 1.580456 | 0.1140025 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Kurtosis | 1.5018 | 0.133142 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Omnibus | 4.7533 | 0.092860 | 4.605 | 5.991 | Can’t reject normality |
| Shapiro-Wilk W | 0.9784388 | 0.6317195 | Can’t reject normality | ||
| Anderson-Darling | 0.2572377 | 0.7206884 | Can’t reject normality | ||
| Martinez-lglewicz | 0.9622557 | 1.114676 | 1.175041 | Can’t reject normality | |
| Kolmogorov-Smirnov | 7.959955E-02 | 0.126 | 0.136 | Can’t reject normality | |
| D’Agostino Skewness | 0.802801 | 0.4220898 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Kurtosis | −0.6426 | 0.520487 | 1.645 | 1.960 | Can’t reject normality |
| D’Agostino Omnibus | 1.0574 | 0.589366 | 4.605 | 5.991 | Can’t reject normality |
Note: As can be seen, D1, D2, and D3 are normally distributed with respect to both groups.