| Literature DB >> 35740327 |
Katia Pane1, Mario Zanfardino1, Anna Maria Grimaldi1, Gustavo Baldassarre2, Marco Salvatore1, Mariarosaria Incoronato1, Monica Franzese1.
Abstract
Big data processing, using omics data integration and machine learning (ML) methods, drive efforts to discover diagnostic and prognostic biomarkers for clinical decision making. Previously, we used the TCGA database for gene expression profiling of breast, ovary, and endometrial cancers, and identified a top-scoring network centered on the ERBB2 gene, which plays a crucial role in carcinogenesis in the three estrogen-dependent tumors. Here, we focused on microRNA expression signature similarity, asking whether they could target the ERBB family. We applied an ML approach on integrated TCGA miRNA profiling of breast, endometrium, and ovarian cancer to identify common miRNA signatures differentiating tumor and normal conditions. Using the ML-based algorithm and the miRTarBase database, we found 205 features and 158 miRNAs targeting ERBB isoforms, respectively. By merging the results of both databases and ranking each feature according to the weighted Support Vector Machine model, we prioritized 42 features, with accuracy (0.98), AUC (0.93-95% CI 0.917-0.94), sensitivity (0.85), and specificity (0.99), indicating their diagnostic capability to discriminate between the two conditions. In vitro validations by qRT-PCR experiments, using model and parental cell lines for each tumor type showed that five miRNAs (hsa-mir-323a-3p, hsa-mir-323b-3p, hsa-mir-331-3p, hsa-mir-381-3p, and hsa-mir-1301-3p) had expressed trend concordance between breast, ovarian, and endometrium cancer cell lines compared with normal lines, confirming our in silico predictions. This shows that an integrated computational approach combined with biological knowledge, could identify expression signatures as potential diagnostic biomarkers common to multiple tumors.Entities:
Keywords: ERBB family; TCGA; bioinformatics; estrogen-dependent cancer; female-specific cancers; machine learning; microRNAs
Year: 2022 PMID: 35740327 PMCID: PMC9219956 DOI: 10.3390/biomedicines10061306
Source DB: PubMed Journal: Biomedicines ISSN: 2227-9059
Figure 1Workflow. The study design encompasses the integration of TCGA miRNA expression data and miRNAs targeting ERBB family genes within a machine learning approach to prioritize common miRNA signatures. Abbreviations: The Cancer Genome Atlas (TCGA), Breast invasive carcinoma (BRCA), Ovarian Serous Cystadenocarcinoma Cancer (OV), Uterine Corpus Endometrial Carcinoma (UCEC), miRNA Target Interaction (miRTarBase) Database, the Cancer Cell Line Encyclopedia (CCLE) dataset (PRJNA523380), miRNA expression dataset.
The selected 42 features (miRNAs) associated with breast, ovarian, and uterine corpus endometrial cancers, with the corresponding feature source, and fold-change in expression (tumor vs. available normal sample normalized read counts), ranked according to the weighted support vector machine (wSVM) model. The miRNAs with ID indicated in bold were evaluated for in vitro validations. The importance weighting column indicates every feature’s weight.
| Model Rank | miRNA ID | Importance Weighting | Feature Source a | FC (BRCA) | FC (UCEC) |
|---|---|---|---|---|---|
| 1 | hsa-mir-183 | 100 | Top 15 | 8.49 | 23.62 |
| 2 | hsa-mir-139 | 93.87 | Top 15 | −7.03 | −11.46 |
| 3 | hsa-mir-145 | 89.75 | Top 15, Targeting ERBB | −5.05 | −10.80 |
| 4 | hsa-mir-10b | 85.99 | Top 15 | −3.20 | −6.57 |
| 5 | hsa-mir-337 | 84.56 | Top 15 | −3.72 | −3.01 |
| 6 | hsa-mir-200c | 83.26 | Top 15 | 3.11 | 3.38 |
| 7 | hsa-mir-200a | 82.17 | Top 15 | 4.72 | 6.26 |
| 8 | hsa-mir-100 | 81.09 | Top 15 | −2.93 | −11.61 |
| 9 |
| 80.4 | Top 15 | −2.60 | −11.19 |
| 10 | hsa-mir-195 | 78.89 | Top 15 | −2.33 | −6.27 |
| 11 | hsa-mir-379 | 77.99 | Top 15 | −1.96 | −4.86 |
| 12 |
| 77.47 | Top 15 | 3.59 | 3.36 |
| 13 | hsa-mir-210 | 76.05 | Top 15 | 7.42 | 6.79 |
| 14 | hsa-mir-200b | 75.21 | Top 15 | 3.36 | 4.50 |
| 15 |
| 75.13 | Top 15 | −2.05 | −7.24 |
| 16 | hsa-mir-143 | 71.74 | Targeting ERBB | −1.83 | −11.42 |
| 18 | hsa-mir-130b | 70.99 | Targeting ERBB | 2.65 | 3.78 |
| 27 | hsa-mir-3127 | 59.62 | Targeting ERBB | 2.37 | 1.95 |
| 30 | hsa-mir-125b-1 | 57.64 | Targeting ERBB | −3.22 | −5.32 |
| 40 |
| 52.59 | Targeting ERBB | 1.94 | 1.65 |
| 43 | hsa-mir-134 | 50.36 | Targeting ERBB | −1.59 | −2.31 |
| 44 | hsa-mir-155 | 50.25 | Targeting ERBB | 2.72 | 2.28 |
| 54 | hsa-mir-199b | 46.53 | Targeting ERBB | 1.01 | −3.24 |
| 58 | hsa-mir-199a-1 | 45.39 | Targeting ERBB | 1.10 | −3.35 |
| 76 | hsa-mir-22 | 41.32 | Targeting ERBB | −1.40 | −1.83 |
| 107 | hsa-mir-21 | 32.12 | Targeting ERBB | 5.01 | 1.03 |
| 108 | hsa-mir-375 | 31.5 | Targeting ERBB | 7.51 | 2.39 |
| 113 |
| 29.65 | Targeting ERBB | 1.46 | 1.10 |
| 114 | hsa-mir-326 | 29.11 | Targeting ERBB | −2.25 | −1.14 |
| 141 | hsa-mir-301a | 21.2 | Targeting ERBB | 3.32 | 1.90 |
| 147 |
| 18.69 | Targeting ERBB | 2.67 | −1.51 |
| 152 |
| 16.72 | Targeting ERBB | 1.19 | 3.21 |
| 154 |
| 16.39 | Targeting ERBB | −2.02 | −1.58 |
| 156 | hsa-mir-205 | 15.28 | Targeting ERBB | −2.67 | 51.59 |
| 157 | hsa-mir-25 | 15.11 | Targeting ERBB | −1.04 | −1.01 |
| 162 | hsa-mir-328 | 13.21 | Targeting ERBB | −1.87 | −1.95 |
| 168 | hsa-mir-125a | 10.71 | Targeting ERBB | −1.31 | −2.85 |
| 175 | hsa-mir-221 | 8.38 | Targeting ERBB | 1.03 | −3.12 |
| 180 | hsa-mir-146a | 7.77 | Targeting ERBB | 1.51 | 2.70 |
| 185 | hsa-mir-34a | 6.75 | Targeting ERBB | 1.21 | −1.23 |
| 196 | hsa-mir-24-1 | 2.36 | Targeting ERBB | 1.03 | −1.54 |
| 205 |
| 0.11 | Targeting ERBB | −1.44 | −1.27 |
a miRNA target interaction by miRTarBase database, Homo sapiens (hsa) miRNA release version 7.0.
Figure 2Heatmap with hierarchical clustering of the 39 out of 42 miRNA features within the Cancer Cell Line Encyclopedia (CCLE) across female primary “breast”, “ovary”, and “endometrium” carcinoma. Left columns show standard deviation (SD) and interquartile range (IQR) for each miRNA along all cell lines. In the columns, cell line clusters for miRNA expression signature similarity are reported.
miRNA candidates for in vitro validations and their miRBase annotation, Homo sapiens (hsa) miRNA, release 21.
| Mature_Acc | Mature_ID | Mature_Seq |
|---|---|---|
| MIMAT0005899 | hsa-miR-1247-5p | ACCCGUCCCGUUCGUCCCCGGA |
| MIMAT0022721 | hsa-miR-1247-3p | CCCCGGGAACGUCGAGACUGGAGC |
| MIMAT0005797 | hsa-miR-1301-3p | UUGCAGCUGCCUGGGAGUGACUUC |
| MIMAT0000736 | hsa-miR-381-3p | UAUACAAGGGCAAGCUCUCUGU |
| MIMAT0004700 | hsa-miR-331-5p | CUAGGUAUGGUCCCAGGGAUCC |
| MIMAT0000760 | hsa-miR-331-3p | GCCCCUGGGCCUAUCCUAGAA |
| MIMAT0002809 | hsa-miR-146b-5p | UGAGAACUGAAUUCCAUAGGCU |
| MIMAT0004766 | hsa-miR-146b-3p | UGCCCUGUGGACUCAGUUCUGG |
| MIMAT0004506 | hsa-miR-33a-3p | CAAUGUUUCCACAGUGCAUCAC |
| MIMAT0004614 | hsa-miR-193a-5p | UGGGUCUUUGCGGGCGAGAUGA |
| MIMAT0000459 | hsa-miR-193a-3p | AACUGGCCUACAAAGUCCCAGU |
| MIMAT0005794 | hsa-miR-1296-5p | UUAGGGCCCUGGCUCCAUCUCC |
| MIMAT0015050 | hsa-miR-323b-3p a | CCCAAUACACGGUCGACCUCUU |
| MIMAT0000755 | hsa-miR-323a-3p a | CACAUUACACGGUCGACCUCU |
a The closely related mature sequence hsa-miR-323a-3p, MIMAT0000755, CACAUUACACGGUCGACCUCU, was included for wet-lab experiments.
Figure 3Relative expression of selected miRNAs in cancer vs. normal cell lines. Left panel: miRNA expression in normal breast (MCF10A) and breast cancer (MCF7 and T47D) cell lines. Middle panel: miRNA expression in normal endometrial (HESC) and endometrial cancer (MFE-280 and EN) cell lines. Right panel: miRNA expression in normal ovary (OCE1) and ovarian cancer (ES2 and EFO21) cell lines. miRNA expression was evaluated by real-time PCR (2−ΔΔCt).
Figure 4Comparison of miRNA expression between TCGA tissues and cell lines used in vitro. (A) TCGA miRNA expression (log scale) of breast invasive carcinoma (n = 1096) and solid tissue normal (n = 104) compared with relative expression in breast cell lines (MCF7 and T47D) and normal tissue cell line (MCF10A). (B) TCGA miRNA expression (log scale) of uterine corpus endometrial carcinoma (n = 545) and solid tissue normal (n = 33) compared with relative expression in endometrial cancer cell lines (MFE-280 and EN) and normal (HESC). The trend concordance in tissue/cell lines is indicated by the green dot, and non-concordance is indicated by the red dot. Statistically significant tumor/normal differences (p-value < 0.05, Wilcoxon test).