| Literature DB >> 35197484 |
Federica Farinella1, Mario Merone2, Luca Bacco3,4,5, Adriano Capirchio6,7, Massimo Ciccozzi8, Daniele Caligiore6,7.
Abstract
Ovarian cancer is one of the most common gynecological malignancies, ranking third after cervical and uterine cancer. High-grade serous ovarian cancer (HGSOC) is one of the most aggressive subtype, and the late onset of its symptoms leads in most cases to an unfavourable prognosis. Current predictive algorithms used to estimate the risk of having Ovarian Cancer fail to provide sufficient sensitivity and specificity to be used widely in clinical practice. The use of additional biomarkers or parameters such as age or menopausal status to overcome these issues showed only weak improvements. It is necessary to identify novel molecular signatures and the development of new predictive algorithms able to support the diagnosis of HGSOC, and at the same time, deepen the understanding of this elusive disease, with the final goal of improving patient survival. Here, we apply a Machine Learning-based pipeline to an open-source HGSOC Proteomic dataset to develop a decision support system (DSS) that displayed high discerning ability on a dataset of HGSOC biopsies. The proposed DSS consists of a double-step feature selection and a decision tree, with the resulting output consisting of a combination of three highly discriminating proteins: TOP1, PDIA4, and OGN, that could be of interest for further clinical and experimental validation. Furthermore, we took advantage of the ranked list of proteins generated during the feature selection steps to perform a pathway analysis to provide a snapshot of the main deregulated pathways of HGSOC. The datasets used for this study are available in the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data portal ( https://cptac-data-portal.georgetown.edu/ ).Entities:
Mesh:
Substances:
Year: 2022 PMID: 35197484 PMCID: PMC8866540 DOI: 10.1038/s41598-022-06788-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Machine Learning pipeline.
Here are summarized the results of the correlation between proteomics data and tumor phenotype. It appears that a vast portion of the proteins displayed no evident correlation, and the majority of the proteins were negatively correlated.
| Tumor | |
|---|---|
| Positive correlation | 20 |
| Negative correlation | 117 |
| Noncorrelation | 6086 |
Figure 3A Subnetwork was created from the main network to increase the interpretability. Red and blue nodes represent pathways that are upregulated (A, B) and downregulated (C). The diameter of each node is proportional to the number of proteins included. Pathways sharing proteins are connected with blue edges, with the thickness of the edges proportional to the number of protein shared. Clusters of nodes were manually annotated.
Figure 2Final decision tree, with focus on the biomarkers.
This Confusion Matrix is achieved in fivefold-cross-validation on CPTAC Ovarian Cancer Confirmatory Study Proteomic Dataset (209 samples). The matrix compares the actual target values (Truth) with those predicted (Pred.) by our model. On first diagonal are reported the samples correctly classified, whereas on second diagonal are reported the misclassified samples.
| Pred. | Truth | |
|---|---|---|
| Non-tumor | Tumor | |
| Non-tumor | 40 | 3 |
| Tumor | 1 | 165 |
This Confusion Matrix reports the performance of our system trained on CPTAC Ovarian Cancer Confirmatory Study Proteomic Dataset and tested on TCGA Cancer Proteome Study of Ovarian Tissue (216 samples). The matrix compares the actual target values (Truth) with those predicted (Pred.) by our model. On first diagonal are reported the samples correctly classified, whereas on second diagonal are reported the misclassified samples. The TCGA dataset only presents samples from the Tumor class.
| Pred. | Truth | |
|---|---|---|
| Non-tumor | Tumor | |
| Non-tumor | 0 | 6 |
| Tumor | 0 | 210 |
Summary of the 100 top-most deregulated pathways, ranked by their NES values, selected from the pathways composing the Subnetwork in Fig. 3. Pathways are named according to their Gene Ontology name or their standard name. In the left column are listed the 50 pathways that are found to be less represented in HGSOC tumor biopsies, a lower NES score corresponds to a lower representation. The right column displays the 50 pathways that appear to be the most over represented. A higher NES score correspond to a higher over representation.
| Less represented pathways | Over-represented pathways | ||
|---|---|---|---|
| Pathway description | NES | Pathway description | NES |
| Regulation of vascular smooth muscle cell proliferation | − 1.8195 | Pre-mRNA splicing | 3.4016 |
| Positive regulation of phospholipid metabolic process | − 1.818 | mRNA Splicing | 3.3727 |
| Neutrophil chemotaxis | − 1.8175 | Regulation of mRNA processing | 3.3537 |
| Positive regulation of lipid transport | − 1.8168 | Cap-dependent translation initiation | 3.2584 |
| Positive regulation of protein kinase B signaling | − 1.8157 | rRNA processing | 3.2518 |
| IGF1R signaling cascade | − 1.8154 | rRNA processing in the nucleus and cytosol | 3.2488 |
| Allograft rejection | − 1.8151 | Influenza viral RNA transcription and replication | 3.2475 |
| Positive regulation of transporter activity | − 1.8148 | Influenza infection | 3.2379 |
| PID_IFNG_PATHWAY | − 1.8141 | Major pathway of rRNA processing in the nucleolus and cytosol | 3.2266 |
| BIOCARTA_BIOPEPTIDES_PATHWAY | − 1.8141 | L13a-mediated translational silencing of ceruloplasmin expression | 3.2208 |
| Regulation of heart rate | − 1.8134 | Spliceosomal complex | 3.2119 |
| Tertiary granule lumen | − 1.8111 | Viral gene expression | 3.2043 |
| PID_CXCR4_PATHWAY | − 1.8088 | Eukaryotic translation initiation | 3.1978 |
| Negative regulation of small molecule metabolic process | − 1.8082 | GTP hydrolysis and joining of the 60S ribosomal subunit | 3.1919 |
| Negative regulation of cell-substrate adhesion | − 1.8075 | Regulation of mRNA splicing, via spliceosome | 3.1886 |
| Regulation of glucose transmembrane transport | − 1.8065 | Cytosolic ribosome | 3.1753 |
| Monocarboxylic acid transport | − 1.8039 | Ribosome | 3.1709 |
| Positive regulation of cholesterol transport | − 1.8038 | Formation of a pool of free 40S subunits | 3.1677 |
| Gastrin signaling pathway | − 1.8037 | Viral transcription | 3.1615 |
| Activation of MAPKK activity | − 1.8037 | Ribosomal subunit | 3.1467 |
| Cortical cytoskeleton | − 1.8036 | Structural constituent of ribosome | 3.1382 |
| Amine metabolic process | − 1.8035 | Eukaryotic translation elongation | 3.137 |
| Negative regulation of cell projection organization | − 1.8027 | Translational initiation | 3.1285 |
| PID_ERBB1_DOWNSTREAM_PATHWAY | − 1.8018 | Peptide chain elongation | 3.1255 |
| Negative regulation of neuron projection development | − 1.8012 | Regulation of RNA splicing | 3.122 |
| IRS-related events triggered by IGF1R | − 1.8001 | SRP-dependent cotranslational protein targeting to membrane | 3.1111 |
| Growth factor receptor binding | − 1.7996 | Nonsense mediated decay (NMD) independent of the exon junction complex (EJC) | 3.1079 |
| Regulation of reactive oxygen species biosynthetic process | − 1.799 | Viral mRNA translation | 3.1065 |
| Neuronal system | − 1.7989 | Eukaryotic translation termination | 3.0998 |
| Negative regulation of axonogenesis | − 1.7965 | HALLMARK_MYC_TARGETS_V1 | 3.0941 |
| Opioid signalling | − 1.7963 | Response of EIF2AK4 (GCN2) to amino acid deficiency | 3.0812 |
| Cell–cell adhesion via plasma-membrane adhesion molecules | − 1.7957 | Protein targeting to ER | 3.0792 |
| BIOCARTA_HER2_PATHWAY | − 1.7956 | Nonsense mediated decay (NMD) enhanced by the exon junction complex (EJC) | 3.0764 |
| PID_ERBB1_RECEPTOR_PROXIMAL_PATHWAY | − 1.795 | Nonsense-mediated decay (NMD) | 3.0737 |
| Phosphatidylinositol binding | − 1.7946 | Catalytic step 2 spliceosome | 3.072 |
| Phosphatidic acid biosynthetic process | − 1.7934 | Selenocysteine synthesis | 3.0554 |
| Granulocyte chemotaxis | − 1.7913 | SRP-dependent cotranslational protein targeting to membrane | 3.0479 |
| Regulation of blood vessel endothelial cell migration | − 1.791 | Establishment of protein localization to endoplasmic reticulum | 3.0463 |
| B cell receptor signaling pathway | − 1.7905 | Regulation of expression of SLITs and ROBOs | 3.0373 |
| Monocarboxylic acid binding | − 1.7896 | Cotranslational protein targeting to membrane | 3.0326 |
| Toll-like receptor cascades | − 1.7875 | Nuclear-transcribed mRNA catabolic process, nonsense-mediated decay | 3.0094 |
| Regulation of calcium-mediated signaling | − 1.7874 | Regulation of alternative mRNA splicing, via spliceosome | 3.0092 |
| Triglyceride metabolism | − 1.7864 | Selenoamino acid metabolism | 2.972 |
| Multicellular organismal movement | − 1.7857 | Protein localization to endoplasmic reticulum | 2.9604 |
| Hydrogen peroxide catabolic process | − 1.7848 | Ribonucleoprotein complex assembly | 2.9379 |
| Negative regulation of cellular response to growth factor stimulus | − 1.7846 | Ribonucleoprotein complex subunit organization | 2.9292 |
| Gamma carboxylation, hypusine formation and arylsulfatase activation | − 1.7846 | Activation of the mRNA upon binding of the cap-binding complex and eIFs, and subsequent binding to 43S | 2.9227 |
| Regulation of sodium ion transport | − 1.7843 | rRNA processing | 2.9199 |
| Detection of external stimulus | − 1.7843 | mRNA Processing | 2.9151 |
| Regulation of Rho protein signal transduction | − 1.7842 | Translation initiation complex formation | 2.8601 |