Literature DB >> 27013923

Contextualizing the Genes Altered in Bladder Neoplasms in Pediatric andTeen Patients Allows Identifying Two Main Classes of Biological ProcessesInvolved and New Potential Therapeutic Targets.

A Porrello1, R B Piergentili2.   

Abstract

Research on bladder neoplasms in pediatric and teen patients (BNPTP) has described 21 genes, which are variously involved in this disease and are mostly responsible for deregulated cell proliferation. However, due to the limited number of publications on this subject, it is still unclear what type of relationships there are among these genes and which are the chances that, while having different molecular functions, they i) act as downstream effector genes of well-known pro- or anti- proliferative stimuli and/or interplay with biochemical pathways having oncological relevance or ii) are specific and, possibly, early biomarkers of these pathologies. A Gene Ontology (GO)-based analysis showed that these 21 genes are involved in biological processes, which can be split into two main classes: cell regulation-based and differentiation/development-based. In order to understand the involvement/overlapping with main cancer-related pathways, we performed a meta-analysis dependent on the 189 oncogenic signatures of the Molecular Signatures Database (OSMSD) curated by the Broad Institute. We generated a binary matrix with 53 gene signatures having at least one hit; this analysis i) suggests that some genes of the original list show inconsistencies and might need to be experimentally re- assessed or evaluated as biomarkers (in particular, ACTA2) and ii) allows hypothesizing that important (proto)oncogenes (E2F3, ERBB2/HER2, CCND1, WNT1, and YAP1) and (putative) tumor suppressors (BRCA1, RBBP8/CTIP, and RB1-RBL2/p130) may participate in the onset of this disease or worsen the observed phenotype, thus expanding the list of possible molecular targets for the treatment of BNPTP.

Entities:  

Keywords:  Bioinformatics; PubMed query.; biomedical modeling; gene set; neoplastic transformation

Year:  2016        PMID: 27013923      PMCID: PMC4780474          DOI: 10.2174/1389202916666151014222603

Source DB:  PubMed          Journal:  Curr Genomics        ISSN: 1389-2029            Impact factor:   2.236


INTRODUCTION

According to the US National Cancer Institute (NCI) website [1] bladder cancer (BC) is classified, in adults, among the most common cancers in the USA, where almost 75,000 new BC cases and more than 15,000 BC-related deaths have been estimated for the year 2014. In particular, these statistics show that in 2013 about 55,000 men and 18,000 women were diagnosed with bladder cancer, with a ratio of approximately 3:1; consequently, BC is reported by NCI as the sixth cause of new cancer cases, and the eighth cause of death among common malignancies [1]. However, this website classifies BC among ‘unusual cancers of childhood’; indeed, collecting statistical data about pediatric/teenage BC using on-line resources is challenging, due the limited number of reported cases. As a matter of fact, the reported incidence of BC in the population, in the age range 15-19 years, is 1·10-6, and even lower in younger patients [2]. In agreement with the data coming from the UnitedStates, the total number of BC cases described in the literature in the age range 0-19 is limited to a few thousand patients, all-time and worldwide (Table ). An extensive analysis of all known cases of pediatric/teen malignant bladder masses [2] revealed that at least ten different types/subtypes of BC may be found for them, although with a rare or very rare incidence, in some cases (Table ). BC in children and teenagers resembles, for a few characteristics, the same behavior of adult tumors. The first symptom usually is a painless macroscopic hematuria; similarly to adults, males are more affected than females, and in some cases a genetic background and a relationship with specific chemicals and pollutants has been identified. However, many features of pediatric/teen BC are specific of this age range and suggest that this pathology is quite different from its adult counterpart. In particular, pediatric/teen BC is usually unifocal, while most adults have multifocal masses; most cases in this age range are of lower grade and stage (despite their delayed diagnosis, which is often due to being unexpected) and have an indolent behavior, thus granting these young patients a far more favorable prognosis and a recurrence rate much lower than in adults [16]. Genetic and genomic alterations frequently seen in older adults are extremely rare in young patients and, despite the constantly higher incidence in males, the male/female ratio is age-dependent, being greater at younger ages [2, 16, 17]. This led some Authors [17] to hypothesize the presence of an age-dependent threshold, set approximately at 19-20 years, which determines different properties in youngsters vs. adults. Along the same line of thought, other Authors suggest that the management and treatment of young BC patients should be modified as well with respect to their adult counterparts; in particular, special attention should be used in preserving both the structural and functional physiology of patients (e.g. sexual functions in females, potency in males, fertility and urinary continence in both genders) [16]. As a consequence of the low number of reported cases, the genetic analyses of pediatric and teen BC patients are even rarer. This is not a trivial problem since, as noted above, the phenotypic patterns of these malignancies are typically different from adults; thus, it is likely that also their overall genetic background is different. In this perspective, it would not be a surprise to find out that different sets of genes are involved in the same tumor, according to patient’s age, regardless of similar histological types, or even that some genes expressed in adult cancers are not involved in their pediatric/teen counterparts, and vice versa. Trying to find answers to these questions, we analyzed what is known about the BC genetics during childhood and adolescence; the 21 genes examined in this article were clearly and univocally identified in patients with bladder neoplasms whose age was ≤ 19 years old. In this way, we were able to link the following genes to BNPTP (note: p53, NF1 e SMARCB1 are repeated twice below, because they were identified using two different methods): i) CK20, p16/lnk4, SMARCB1, ALK, ACTA2, CD34, CD56, MUC1, p53, WT1, VIM and MYOG by immunohistochemistry; ii) H-RAS and NF1 by genetics, being the patient also affected by other known genetic diseases, namely Costello syndrome and neurofibromatosis, respectively; iii) K-RAS, N-RAS, NF1, p53, PTPN11 and SMARCB1 by gene sequencing; iv) Gli1, Gli3, Myf5, MyoD1, Ptch1 by mRNA quantification (microarray analysis). Gene Ontology (GO) ‘is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge’ [18]. GO-based computational tools, in particular, are used for performing analyses that allow determining which biological processes (BP), cellular components (CC) and molecular functions (MF) are most involved in a pathology, experimental condition, cellular response to stimuli, etc. [19, 20]. We found that 75 GO-BP terms reach the statistical significance for our list of genes; this number drops to 26 using a semantic similarity algorithm for redundancy reduction. The 26 GO-BP categories that are left after this filtering involve: i) regulative processes (cell signaling, metabolism, matrix adhesion, intracellular transport, etc.) and ii) differentiation and development. Looking at the GO results from a different standpoint we were also able to distinguish GO terms that suggest similarities between adults and children/teenagers (e.g. ‘positive regulation of nucleobase-containing compound metabolism’ ‘regulation of intracellular transport’, ‘actin filament-based process’, and ‘cytoskeleton organization’) and others that seem to be peculiar of BNPTP (‘gland development’, ‘metanephros development’, ‘striated muscle cell differentiation’, ‘regionalization’, ‘pattern specification process’, and ‘embryonic morphogenesis’). An assessment of possible biological events happening upstream of these 21 effector genes and of mechanisms capable to interfere with them with different modalities was performed i) evaluating which genes in this list are downstream of the 189 oncogenic signatures of the Molecular Signatures Database (OSMSD) and ii) checking if the nature of the perturbation of possible upstream genes that were found was compatible with the genomic perturbations described in BNPTP. A standard and powerful use of these 189 gene signatures is accomplished inside the Gene Set Enrichment Analysis (GSEA) framework [21] or using other gene set-based tools [22-24]. Many publications based on GSEA and similar methods have proven that it is possible to use evidence derived by a broad spectrum of experiments (e.g. on human and murine models) and apply it to the analysis of two different groups of samples/patients [25-27]. This is an intrinsically noisy process that GSEA manages relying on the robustness of sets of genes, rather than single genes. Since we do not have, instead, any high-throughput data to start from and our only input is a list of deregulated/altered genes, we decided to rely on the strength of the inter-experimental biological compatibility above described and accept a higher rate of false discoveries for each relevant case found in the hit matrix, which has 53 rows (gene sets), 21 columns (BNPTP genes) and 60 hits (matches between a gene set and a BNPTP gene). A number of possibly important upstream events were defined using this method, allowing connecting the 21 genes that are altered in BNPTP with the regulation of (proto)oncogenes and (putative) tumor suppressors, such as BRCA1, CCND1, RBBP8/CTIP, E2F3, ERBB2/HER2, WNT1, YAP1 and RB1-RBL2/p130 combined (see the Results section). Despite the fact that TP53 (a.k.a. tumor protein p53) is the most frequently mutated gene in BC specimens from adults [28], its status is not well-defined in the only (teenage) patient reported with TP53 alterations [17] and our gene set-based analyses were not able to determine if and how much TP53 is important in pediatric and teenage BC cases. Notably, our results advocate for a role played by CTIP and WNT1, which instead do not seem to be involved as pivotal genes in BC of adults, as part of the oncogenic signaling pathways of BNPTP.

MATERIALS AND METHODS

Literature-based Gene Selection

The literature used for identifying BNPTP genes was selected using quite stringent criteria, since the analyses described in this article are sensitive to the presence/absence of single genes in the final list. MEDLINE was accessed multiple times, up to February 2015, and searched through the PubMed search engine using the strings “bladder cancer genetics pediatric” (which retrieved 61 results), “bladder cancer gene pediatric” (54 results), “bladder cancer genetics child” (116 results), “bladder cancer gene children” (84 results), “bladder cancer gene child” (69 results) and “bladder cancer genetics children” (131 results). We chose the standard PubMed search and decided not to use MeSH terms and Boolean operators, in order to maximize the number of hits. The results were merged into one list and compared/integrated with the references available from Vallasciani and coworkers [2]. To expand the potentially suitable literature, also references cited in the manuscripts selected so far, but absent from our initially merged list, were checked, and the neoplastic alterations identified (such as “rhabdomyosarcoma”, “transitional cell carcinoma” and others similar) were used for further PubMed searches together with the keywords “bladder cancer” and either one of the following: “child”, “children” or “pediatric”. Any additional hit coming from this new search and not present in the former list was checked for its content. Then, all references were quality-checked, and only those simultaneously fulfilling the following three requirements were considered: (1) the gene had to be undoubtedly and univocally identified in the cancer specimen, either by (a) gene sequencing, (b) mRNA level quantification by microarray, (c) protein expression by immunohistochemistry (provided the absence of cross-reactions) either using immunoblotting, or kinase assay, or indirect immunofluorescence on sample slides, or tissue microarray, or (d) analysis of chromosome rearrangements by FISH. For example, an article from Scott and collaborators [29] identified the over-expression of high molecular weight cytokeratins using monoclonal antibodies. However, an accurate check of the same manuscript [29] and of the web site of the Company that sells this antibody [30] revealed that it recognizes at least four different cytokeratins, and for this reason this report was not used for our gene list. (2) Patient’s age had to be clearly indicated and not to be higher than 19 years, or because stated for the patient(s), one by one, or because the age range reported was lower than 19 years old for all patients examined and unequivocally identified. Therefore, reports in which the age range was within our upper limit, even if single patients’ ages were not specified, met our selection criteria [31, 32]. (3) The primary tumor had to be localized in the bladder. This requirement prevented us from using several contributions about the rhabdomyosarcoma (RMS) genetics, since in many cases of pediatric/teen reports the primary localization of this tumor is not described (RMS specimens are usually merged, irrespective of their explantation origin). For example, we were not able to find any paper describing the involvement - in bladder RMS - of the PAX-FOXO1 fusion gene, which is one of the most common genetic alterations found in RMS occurring in other locations [33]. The final list of references that was used for the present report included 16 manuscripts published between 1989 and 2014 (Table ; references inside), allowing the identification of 21 genes.

Gene Ontology (GO) Analysis

The 21 genes identified in the above screening were combined and a GO analysis was performed on them. Indeed, while, as reported in Table , the nature of the alteration and the gene status found for each are very different and involve mRNA level, protein level or protein activity, both in terms of up- or down-regulation and detectable presence/absence, they all share the status of BNPTP biomarkers. Additionally, we assumed that the histological heterogeneity of tumors described was partially representative of the heterogeneity in the population and considered their aggregation as a balanced methodological choice, especially in the light of recent trends in medicine. As a matter of fact, our approach i) can be seen as more restrictive of the philosophy that inspires basket trials (for patients sharing some features, independently of their tumor histology) [46, 47], since we maintained the sharp boundaries of including only pediatric and teen BC and ii) is focused on sifting out the shared biological themes of malignancies that affect the bladder in pre-adults and not on defining biological processes that are specific of the single tumor entries of Tables and . This GO analysis relies on the Expression Analysis Systematic Explorer (EASE) score (a p-value obtained through an adjusted Fisher’s exact test) [48] and was performed using DAVID Bioinformatics Resources [49]; the selected background was ‘Homo sapiens’. Each GO category was considered for further analyses only when fulfilling these three criteria: 1) is referred to biological processes (BP); 2) has two or more gene members inside the BNPTP gene list (Table ); 3) has a p-value lower than 0.01 (Table ). This third choice was made in order to collect GO-BP categories that account for at least 75% of the genes belonging to the original BNPTP gene list and to keep, among the statistically significant GO terms, > 33% of the GO categories that fulfil 1) and 2). REVIGO [50] and Cytoscape [51] were respectively used for summarizing (reduction of the semantic redundancy of the GO terms) and visualizing the GO-BP results that comply with the three aforementioned criteria. The level of ‘allowed similarity’ was 0.5 (classified by the REVIGO developers as ‘small’), the selected species was ‘Homo sapiens’ and the chosen semantic similarity measure for assessing the distance between two GO terms was SimRel. SimRel is calculated from the directed acyclic graph (DAG) of the GO terms and, for each couple of GO categories, accounts for: i) the information content (IC) of their most informative common ancestors (MICA) in the graph; ii) the IC of the two categories that are compared; iii) the MICA probability of annotation [52]. The REVIGO algorithm agglomerates GO terms and defines their level of dispensability through a procedure that is conceptually similar to a hierarchical clustering where, after calculating all the pairwise SimRel distances, GO terms are selected for the summarizing graph or dropped according to 1) their biological specificity, based on how many proteins they tag in the GO database (see below), 2) their p-values (previously calculated by DAVID), 3) the existing parent-child relationships in the GO-DAG [50]. Table shows how much the GO-BP terms found using DAVID Bioinformatics Resources are dispensable in a scale from 0 (100% indispensable) to 1 (100% dispensable). One of the output files generated by REVIGO was used as the input for creating a Cytoscape graph [51]; in this computational step, the European Molecular Biology Laboratory (EMBL) - European Bioinformatics Institute (EBI) GO Annotation (GOA) database is used for ultimately tagging the summarizing GOs of the Cytoscape network [53, 54]. Using two different GO databases, one for assigning the p-values and one for selecting the most representative GO terms, slightly improves the consistency of this process. Notably, while DAVID measures how relevant is the contribution of the 21 BNPTP genes in each GO category, REVIGO defines topological relationships among the selected categories (in our case, 26) as a whole, i.e., without accounting for how many of the original BNPTP genes belong to them. In this way, our approach considers the role played by a GO category as a working hypothesis and displays its semantic similarity network as if that GO category was fully involved in the BPs of these malignant neoplasms.

Identification of Possible Upstream Inducers and Additive/synergistic Effects Using Oncogenic Signatures

The 189 gene sets (oncogenic signatures) that were used for these analyses were obtained from the Molecular Signatures Database of the Broad Institute. One of their main uses has been comparing two groups of samples and determining if the expression values of the genes that belong to each gene set suggest that in either group those genes have significantly higher expression values (this is called ‘enrichment’), using a Kolmogorov-Smirnov-like statistic [21]. In our analysis, instead, the occurrences of the 21 BNPTP genes in the OSMSD are calculated running an in-house developed MATLAB [55] script, thus obtaining a binary matrix M (mi,j = 1 when the gene j is found in the gene set i (this is called a ‘hit’) and 0 otherwise). For the sake of brevity, M is named hit matrix and when in the text we comment the existence of a match between a gene set and a gene, we use phrases such as “the gene X is hit by the gene set Y”, “the gene set Y is hit by the gene X”, etc. Also for short we use phrases such as “pro-growth”, “anti-growth”, “pro-proliferative”, “anti-proliferative”, and similar expressions, for the gene sets, which are far from fully depicting the complexity of the experiments performed for defining them, but are supposed to help the [56] Reader to quickly determine the biological background and/or polarity of what is shown. Remarkably, the 21 genes altered in BNPTP have different cell functions: beside oncogenes and tumor suppressors, there are cytoskeleton components, cell-cell adhesion factors, and so on (Table ). Therefore, looking for matches in the hit matrix can be seen as a convenient strategy of backtracking, which allows shifting the analysis focus from the identified BNPTP genes to potential (and not yet identified) inducers/co-regulators that mostly belong to the families of oncogenes and tumor suppressors. We want to specify that from here to the end of this article the words oncogenes, tumor suppressors and similar may be loosely used, for short. Indeed, for the sake of our evaluations, a gene officially classified as an oncogene or a gene able to promote cell cycle progression, cell growth, and comparable/related biological processes, merge into the same experimental gene group; similarly for tumor suppressors vs. anti-proliferative, pro-apoptotic or likewise defined genes. However, this descriptive style is applied only to groups of genes; instead, when a statement is made about a specific gene, it is defined and referenced as accurately as we deemed necessary. After finding all the gene hits, a spreadsheet was generated, aiming at summarizing and displaying in a user-friendly way the available data. The matrix columns were annotated with information about the 21 genes, using the following fields: 1) gene alteration, 2) description of the experimental evidence, 3) number of patients on which those data are based (that we also relate to the degree of reliability of that gene), and 4) putative activity type (Table ). Additionally, each gene set was annotated with information found in the on-line OSMSD resources, using the following five fields: 1) brief description (it summarizes the experiment performed), 2) full description or abstract (it explains more in detail the experiment or has an excerpt from the article’s abstract), 3) source publication (it displays the reference article or the Authors), 4) exact source (it describes the experimental comparison and the level of stringency used), and 5) organism (it shows if the cell line used was human or murine) (Table ). We are aware that the style and content of these five fields would need improvement, and we have also noticed mistakes in this annotation. Nevertheless, since the incomplete or wrong annotation of OSMSD did not interfere with our analyses and we have amended and edited what was needed in the main text, we have left, for consistency, the information of Table as it can be found in the Molecular Signatures Database. There are two data features that increase the reliability of this type of analysis: a) all the OSMSD have a number of members ≤ 481 (NFE2L2.V2) and, in the case of the 53 gene sets used for this analysis, ≤ 294 (STK33_NOMO_UP) (Table ). This stringent selection of candidates, based on gene sets having relatively small sizes, reduces the risk that hits within a gene set happen by pure chance; b) all found matches are shown in Table , so that Readers can formulate hypotheses on their own about the meaning of these hits. However, in order to further increase the reliability of the assessments made in the Results, we decided to disregard those gene sets that do not comply with each of these three criteria: (i) the experimental procedures, reported in the manuscript that was used for defining the gene set (Table , column 4), allow estimating/determining the type and effect of the experimental stimulus; (ii) there is a good degree of biological consistency between BNPTP and the biological system or the cells (either transformed or normal) used in the gene set experimental procedure. Alternatively, the cell lines can be regarded, in our judgement or according to the literature, as recipient/model cells of general biological relevance where specific biochemical events are induced and/or tested. A third case is that the gene targeted by the gene set experiment is tested in what seems to be a cell line-specific background, but there is enough evidence that the same gene is important for a spectrum of cancer types or biological phenomena that goes beyond the disease typically modeled by that cell line. When this last case is applied, we explain in the text the biological implications of this inclusion (this is what we did for some gene sets identified using leukemia cell lines); (iii) it is possible to determine a specific or non-specific gene/protein targeted by the methods used to induce/inhibit cell growth, using both the Authors’ description and the OSMSD annotation; for example, in some experiments cells were treated with growth factors, but the identification of the proteins, which are up- or down- regulated, was not performed or was not clearly enough explained (similarly for cells that are slowed down). The gene sets marked with an X in Table , column 7, include either those obtained through the downregulation of a (putative) tumor suppressor, or those generated through the upregulation/downregulation of a (proto)oncogene. The downregulation was achieved in knock-out mice (RB_P130_DN.V1_UP) or by RNA-interference (ATM_DN.V1_DN, BMI1_DN.V1_UP, BRC A1_DN.V1_UP, CTIP_DN.V1_UP, HOXA9_DN.V1_UP, P53_DN.V2_DN, P53_DN.V2_UP, PTEN_DN.V1_DN, STK33_NOMO_UP, STK33_SKM_UP, TBK1.DF_DN, TBK1.DN.48HRS_UP); since these techniques are reported to have a high efficiency, we confidently assumed that the target gene/protein has a residual activity close to zero. The upregulation was mostly obtained by overexpressing a transgene using a suitable vector (such as a virus); other techniques include adding a chemical compound targeting a specific protein (ERB2_UP.V1_UP, NOTCH_DN.V1_UP, WNT_UP.V1_DN, WNT_UP.V1_UP) or achieving gene amplification (YAP1_UP). In most cases, the exact amount of upregulation is not quantified (either in terms of cell proliferation or in terms of concentration/activity of the intracellular protein), but we conventionally considered these gene sets as reliable as those based on knock-out mice and RNA-interference, since the described techniques and reagents are largely used in the field and show consistent and reproducible results [57]. Gene sets that do not fulfil the above criteria (26 out of 53) were candidates for exclusion (CFE) from the Results. Overall, gene sets were CFE because: i) we deemed the tissue of origin of these cells not informative enough for making inferences on pediatric/teenage BC and this feature was not balanced by the cell line being considered a model of general relevance (ATF2_S_UP.V1_DN derived from myometrium; CAHOY_ASTROGLIAL derived from astroglia cells; ESC_J1_UP_LATE.V1_DN, ESC_V6.5_UP_ EARLY.V1_UP and ESC_V6.5_UP_LATE.V1_DN based on embryoid bodies; PIGF_UP.V1_DN and VEGF_A_UP. V1_UP based on human umbilical cord vein endothelial cells, JAK2_DN.V1_UP of erythroleukemia cells), ii) the target gene alteration was not sufficiently described, either as a mutation of DNA or of the protein sequence or for the effects induced on the protein function (P53_DN.V1_DN and P53_DN.V1_UP), iii) the effect of gene silencing/ upregulation is not sufficiently clear in tumorigenesis or has not been exhaustively stated in the relevant manuscript (PRC2_EDD_UP.V1_UP was discarded because the role of the EED gene (a transcriptional repressor, member of the Polycomb group) in oncogenesis is currently unclear and the only mutation in man described so far [58] causes a Weaver-like syndrome, characterized by overgrowth but not cancer; RPS14_DN.V1_UP was discarded because RPS14 encodes a ribosomal protein part of the 40S subunit of the ribosome: its impairment has a general role on protein biosynthesis, but not a “specific” role in carcinogenesis), iv) the cells were forced to differentiate (LEF1_UP.V1_DN and LEF1_ UP.V1_UP, where an epithelial to mesenchymal transition (EMT) is induced), v) the cells were treated with growth enhancers or inhibitors having a generic effect on cell proliferation without describing the main genetic targets related to tumorigenesis (CSR_LATE_UP.V1_DN, DCA_UP.V1_DN, GCNP_SHH_UP_LATE.V1_UP, IL15_UP.V1_UP, IL2_ UP.V1_DN, IL2_UP.V1_UP, IL21_UP.V1_UP, PDGF_ UP.V1_DN, PDGF_UP.V1_UP, RAPA_EARLY_UP.V1_ DN and TGFB_UP.V1_DN), vi) the definition of the gene set in the OSMSD is ambiguous and, most likely, the induced effect is not specific enough (LTE2_UP.V1_UP). Finally, we decided to partially overrule the aforementioned selection criteria by including in the final analyses the two gene sets concerning TP53 that were excluded so far (i.e., P53_DN.V1_DN and P53_DN.V1_UP) because of the importance of this tumor suppressor and of some general evaluations about how these two gene sets were generated. Specifically, the NCI-60 panel of cell lines was screened and two cell line groups were created for TP53: a) 17 carrying a normal p53; b) 33 with a mutant p53 [21]. Generally speaking, it is true that i) each different mutant of p53 behaves differently, ii) some of them exhibit gain of function (enhanced tumorigenicity and resistance to therapy) and others do not [59, 60], and iii) it is challenging to interpret results of a direct comparison between these two groups, since they were created by combining all the mutants. However, it is also true that these two gene sets are based on the presence of mutant p53, so on conditions that, more or less effectively, promote cell proliferation. The “rescue” of these two cases brings the final count of gene sets for this article to 29 used and 24 discarded from the analysis.

Standardized Criteria for Commenting the Matrix Hits

Each hit of this matrix has been evaluated in the following sequential way. Step 1: assuming that the background of BNPTP is constantly pro-growth (all being cancer patients), the background of the experiment that generated each gene set is assessed, thus allowing splitting the 29 gene sets that successfully went through the steps above into “generated in pro-growth conditions” and “generated in anti-growth conditions”. Step 2: it is checked if the polarity of the gene set (containing genes upregulated or downregulated) matches the polarity of the activity type of the BNPTP genes (also upregulated or downregulated). BNPTP genes that are conflicting or undefined are sometimes commented in the Results, but we do not consider them as part of our core assessments. Step 3: combining background and polarity, it is possible to generate standard comments, which are based on the assumption that 16 cases are overall possible. Indeed, in a very simplified perspective, the experimental input can determine activation or repression of an oncogene or a tumor suppressor (total: four cases possible). Additionally, the gene set typology is UP or DOWN and the same happens for each BNPTP (UP or DOWN) (total: four cases possible). The combination of the former and latter four cases defines the 16 (= 4 x 4) instances mentioned before. Notably, these variables allow also splitting the 16 cases into two groups with 8 instances each, due to the fact that in half of the cases the biological background produced for the gene set is pro-growth and in the other half is anti-growth; this means that in 50% of the cases there is compatibility between BNPTP and gene set biological background and in the other 50% there is, instead, incompatibility. Of course, these cases are a priori defined and have nothing to do with how many actual matches we found for each of these 16 cases. In fact, there is an evident bias in the sort of experiments performed for determining the gene sets (mostly pro-growth) as well as in the type of activity found for each BNPTP gene (mostly UP). Finally, when these analyses allow identifying a possible gene pathway of BNPTP, rather than a single upstream gene, up- and down- stream genes are connected, in the text, using a →; instead, the up- and down- regulation symbols are, respectively, ↑ and ↓.

Gene Identification

Considering that multiple gene names are used in the literature for the same DNA sequence, all genes analyzed in the Results are uniquely identified through the Online Mendelian Inheritance in Man (OMIM; http://www.omim.org) database identification (ID) number. This ID is reported in Table , column 11 (for the genes altered in BNPTP) and in the main text (for the genes that were targeted by the experiments that allowed defining the gene sets collected in the Molecular Signatures Database).

RESULTS

Gene Ontology

We analyzed the 21 genes that have been found altered in BNPTP using methods based on gene ontology (GO), which is largely applied to genomics data [20, 61, 62]. We chose an exploratory point of view [63], trying to understand the collective properties of these genes as much as possible, while deemphasizing the importance of the p-values obtained for each GO term, since these measures might be affected by the small number of genes available. In particular, we i) only looked for the most relevant GO terms of biological processes (BP), thus excluding GO terms related either to cellular component (CC) or molecular function (MF), ii) took into account, for our assessments, that some BPs are found because of the original bias of starting from a gene list, which contains cancer-related genes, iii) tried to determine if statistically significant BPs that are apparently unrelated to BNPTP suggested the involvement of relevant, but elusive mechanisms, iv) used the evidence provided by the GO analyses for connecting the identified BPs and important mechanisms and molecules that have been described in BC, and v) minimized the number of GO-BP categories to be discussed using a measure of semantic similarity within an agglomerative process, which is conceptually similar to a hierarchical clustering applied to GO terms [50, 52]; this makes possible selecting redundant and non-redundant GO terms and allows reducing the complexity of the GO-BP graph. The GO-BP summarizing graph (Fig. ) shows two main subgraphs (left, with 9 nodes and right, with 11 nodes), two nodes linked to the node ‘small GTPase mediated signal transduction’, which belongs to the right subgraph, and 4 isolated nodes, which we positioned at the bottom, on the left side of this graph. The left subgraph has stronger semantic connections (displayed as thicker edges, on average) among its nodes, when compared to the right subgraph. The right subgraph mainly contains terms related to regulatory mechanisms, cellular organization and replication; three isolated nodes (i.e., ‘regulation of growth’, ‘cell cycle checkpoint’ and ‘death’) are fully or partially related to cell replication as well. Some prominent GO terms of the right subgraph that are expected to be deregulated in cancer cells are: ‘regulation of DNA replication’, ‘regulation of cell proliferation’ and ‘smoothened signaling pathway’. From a complementary perspective, four nodes on the right are related to cell-cell communication (‘regulation of cell-matrix adhesion’) and signal transduction (‘small GTPase mediated signal transduction’, ‘positive regulation of Rac protein signal transduction’ and ‘positive regulation of nucleobase-containing compound metabolism’). The GO term ‘small GTPase mediated signal transduction’ contains the protein NF1 and the three main members of the Ras subfamily (HRAS, KRAS and NRAS) (Tables 3 and S1), which are mutated and/or deregulated in BNPTP. Overall, up to almost 30% of all human tumors screened - independently of tissue origin - present some mutation in any of the RAS genes, especially KRAS [64]. Approximately 13% of the BC specimens has a mutation in one of the RAS proteins [65] and this is particularly true for the non-muscle invasive BC in adults [66]. Due to the chemical nature of GTPases, which hydrolyze guanosine triphosphate (GTP), it is very interesting finding the category ‘positive regulation of nucleobase-containing compound metabolism’. Indeed, beyond the direct role of GTP, also nucleotide-derived cyclic compounds seem to play a central role in the urogenital cancer development [67] and our GO analysis allows highlighting the importance of these biological processes in BNPTP. Notably, at least some inhibitors of phosphodiesterases (PDEs) - which are critical components in the cyclic AMP/protein kinase A (PKA) and cyclic GMP/phosphokinase G (PKG) signaling pathways - are able to induce apoptosis and inhibit cell growth in rodent models of bladder cancer; additionally, one bladder cancer cell line (HT1376) derived from a woman [68] shows the overexpression of PDE5, similarly to human squamous and transitional cell carcinomas, when compared with normal urothelium [69]. Therefore, this last GO-BP term suggests that these biochemical events may be involved not only in adult BC but also in BNPTP. It is crucial for a cell to segregate DNA and cytoplasm to the daughter cells, functionally using its molecular mechanisms of cell division. Indeed, genes impairing the normal progression of cytokinesis are either down- or up- regulated in human cancers, according to their cellular role; additionally, some of them have been mapped to chromosomal regions that are either deleted or amplified in tumors or tumor-derived cell lines [70-73]. GO terms such as ‘regulation of intracellular transport’ (inside the right subgraph), ‘actin filament-based process’ and ‘cytoskeleton organization’ (outside the right subgraph) fit quite well in these processes. Human genes causing cytokinesis failure have been involved in cancer pathogenesis [74, 75], inducing the formation of polyploid cells with an abnormal growth. In particular, this is true for bladder cancer [76]. Aneuploidy may be a consequence, among other possibilities, of the centrosome function impairment. Indeed, it has been demonstrated that, in some BC specimens, mitotic kinases, such as Aurora A, are significantly overexpressed or amplified [77], and may induce defective centrosome behavior as well as polyploidy [78, 79]. Similarly concerning the intracellular movement, but also cell-cell communication processes, there is the ‘regulation of synaptic transmission GABAergic’, despite its role is usually related to the normal function of the nervous system (which here relies also on the GO term ‘regulation of neuron apoptotic process’). Notwithstanding the absence of a direct link between bladder cancer and vesicle movement in the literature, a hypothetical link may be defined. The correct course of cytokinesis involves the addition of membrane to the cleavage site, to allow furrow ingression and cytodieresis. These phenomena occur through the movement of lipidic vesicles originating from the endoplasmic reticulum and modified in the Golgi stacks; failure of this intracellular transport causes cytokinesis impairment in animal models [80]. The parallel between this vesicle movement inside the cell and that of synaptic vesicles is quite straightforward, thus it is tempting to hypothesize that at least some molecular mechanisms are in common between these two biological phenomena. Altogether, nodes described so far (both inside the right subgraph and isolated) create a coherent picture of a general deregulation at the cellular level, as it is expected from specimens of cancer patients. The left subgraph is somehow less expected, though; its nine nodes have thicker lines than the nodes in the other subgraph (Fig. ), suggesting that their semantic connections are more intimate. If we consider the whole group of nodes, a relationship emerges between most of the GO terms identified and various processes of embryogenesis and/or development. Six of these nodes are indeed involved in organogenesis (‘gland development’, ‘metanephros development’), tissue differentiation (‘striated muscle cell differentiation’) or body patterning (‘regionalization’, ‘pattern specification process’, ‘embryonic morphogenesis’). Two nodes of the left subgraph (‘aging’ and ‘cell aging’) are not apparently related to these processes, although it is fascinating to note that they may be considered as opposing the natural, embryonic related cellular rejuvenation [81, 82]. The only apparently off-topic nodes of (Fig. ) seem to be ‘visual learning’ (inside the left subgraph) and ‘response to abiotic stimulus’ (linked only to the node ‘small GTPase mediated signal transduction’). The definition of the former is “any process in an organism in which a change in behavior of an individual occurs in response to repeated exposure to a visual cue”; the definition of the latter is “any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of an abiotic (non-living) stimulus” [83]. Consequently, these two GO terms do share logical connections, being light an abiotic stimulus and a visual cue. As a matter of fact, the link between cyclic rhythms (which depend on the alternation of light and darkness) and human diseases (including cancer) or health issues is not a novelty and has been investigated by chronobiology [84, 85]. Even the young age of the patients that are the focus of our analyses does not seem to be a limitation, except possibly for the youngest cases (see, for instance, Table , row 27), since the emergence of important biological rhythms has been shown within 18 weeks from birth [86]. One of the genes involved in circadian cycles is TP53 [85, 87, 88], which is present in the list of genes that are altered in BNPTP, although without a well-defined status (Table ). This gene belongs to GO:0009416, ‘response to light stimulus’ (Tables S1 and S2), which is one of the five GO terms (out of the 75 identified using EASE) that depend on ‘visual learning’. Additionally, melatonin ‘is a chemical signal of darkness’ [89] that has also been used for treating cancer patients [90, 91]; melatonin receptors MT1 and MT2 are G-protein coupled receptors that are expressed in various parts of the body, including the bladder [92]. The physiological changes induced by melatonin [92], the role of melatonin levels in aging [93] and apoptosis [94] and, more specifically, the direct effect of night work on bladder cancer formation in adults [95] as well as the bladder role in the excretion of melatonin metabolites at night [94] suggest that the relationships between the genes that belong to the ‘visual learning’ GO category and melatonin might play some role in BNPTP. Finally, if we refer to generic stimuli, any non-endogenous chemical might be involved in the category ‘response to abiotic stimulus’. Indeed, it is generally accepted that some chemicals influence the health of the inner bladder walls, since this tissue comes into contact with any hydrosoluble compound waiting to be excreted in the urine. As a matter of fact, some substances – mostly pollutants – are directly or indirectly linked to bladder cancer formation in adults [65]. However, the contact between compounds and bladder walls usually requires many years to induce the neoplastic transformation [96], thus this situation is not easily applicable to pediatric/teen patients, even conceding that such substances might enter into contact with the fetus of a pregnant mother. For example, in adults the risk of BC is directly proportional to the number of years spent smoking and the exposure to occupational chemicals may cause BC after several decades. Also, hairdressers who performed their jobs for more than 10 years have a five-fold increase of their risk of being diagnosed with BC, while the hazard increases by 3.3 times for people who use permanent hair dyes at least once a month for 15 and more years [96]. Overall, in adults the BC incidence increases with age, being most commonly diagnosed in the seventh decade of life [97]. Altogether, data suggest that environmental factors per se (without a genetic predisposition) are not compatible with BC formation in infants and are weakly correlated with BC in teenagers; however, it is evident that the time between first exposure and BC diagnosis is highly variable, probably depending on the stimulus (quality, quantity) and on the individual susceptibility. This quantitative topic deserves, in our opinion, to be investigated further.

Gene Set Analysis and Identification of Possible Upstream Genomic Events

There is an urgent need to formulate hypotheses that, even if partially destined to not be confirmed experimentally, may help us understanding what happens, at a molecular level, when genes are altered (at the level of DNA, mRNA or protein) in BNPTP. In order to increase the chances to find relevant upstream events, either in etiological terms or as parallel biochemical events that target these effector genes, we limited our analysis to gene sets having an oncological relevance, which were described in biological contexts generally compatible with BNPTP (thus raising the conditional probability of finding successful matches). It is important to highlight that, since the average size (i.e., number of member genes) of the 189 oncological gene sets that were used is 165.71 (Table ), the average probability for a gene to be in a gene set by pure chance, looking at each gene set individually, is in the order of 1%. While a match between gene names in the BNPTP gene list and among the gene sets of oncological relevance is defined by a binary answer (namely, “yes, it belongs to” or “no, it is not found in”), the same is not true in terms of biological compatibility between a) experimental input and model system used for defining the gene set and b) nature of the alteration(s) found in the BNPTP genes. For this reason, while we have found 60 hits in the 53 x 21 matching matrix, the cases actually discussed below are only 34; indeed, our comments are limited to the most straightforward matches found and the other gene sets are left to the interpretation of the Readers. The gene descriptions provided below, at the beginning of each gene paragraph, are intended as i) helpful information to be used by the Reader when checking the hits of Table and the supplementary data of individual gene sets contained in Table and ii) part of the contextualization approach followed throughout the entire article, since we highlight gene features and molecular mechanisms that are mostly related with cell cycle progression and oncogenesis. The second half of each gene paragraph, instead, is part of the Results section in the most usual way and conceptually depends on the last three paragraphs of the Materials and Methods section. Since the analyses reported below are intended for the tumor samples in which these gene alterations were found in BNPTP, we suggest reading this section on a gene by gene basis and using all the information reported in Table .

ACTA2

Alpha-actin-2 (ACTA2) is one of six different actin isoforms which have been identified in vertebrates; in particular, this actin is present in the human aortic smooth muscle. In the aorta, ACTA2 interacts with the beta-myosin heavy chain MYH11 [98]. Recently, some links between ACTA2 and cancer have been found: ACTA2 regulates c-MET and FAK expression in lung adenocarcinoma cells, which positively and selectively influence the metastatic potential [99]. Moreover, the acquisition of ACTA2 expression in the sarcomatous component suggests that an EMT had occurred in the progression to metaplastic breast carcinoma [100]. Interestingly, human malignant melanoma cells release a platelet-derived growth factor-like substance that inhibits the expression of this gene in normal cells [101]. The existing match between this gene and the gene set BMI1_DN.V1_UP [102], based on the silencing of the oncogene BMI1 (OMIM ID: 164831), establishes a link between the anti-growth condition of this gene set and the tumor status of the patient who had an over-expression of the ACTA2 protein (detected by immunohistochemistry). Since the gene set is UP and the gene type is UP too, this match happens in conflicting experimental/clinical conditions. An analogous case happens for the gene set HOXA9_DN.V1_ UP [103], generated by silencing HOXA9 (OMIM ID: 142956), which might be relevant for its function in sustaining the cell proliferation rate [104], again with an UP/UP polarity for the gene set and this BNPTP gene in the presence of a discordant biological background. The experiment that defined this gene set was performed in acute myeloid leukemia (AML) cells, but since HOXA9 is important in many cancers [105], we decided to comment this hit as a viable one. This gene is also hit by STK33_SKM_UP [106], which was defined upon silencing of STK33 (OMIM ID: 607670) in mutant-KRAS cells. The precise molecular functions and role of STK33 in these cells is debated [106-109]; however, as far as it is relevant for our analyses, since this gene set was derived in pro-apoptotic conditions and there is an UP/UP polarity for the couple gene set STK33_SKM_UP/gene ACTA2, this is another clear case of two conflicting biological conditions where this gene is upregulated. Finally, ACTA2 belongs to the gene set P53_DN.V1_DN (mutant p53, pro-proliferative conditions) [21], a gene set that collects genes that are down-regulated in the presence of a mutated p53 (OMIM ID: 191170). In this last case, we have compatibility between the experimental conditions, but a conflict at the level of gene response (DOWN for the gene set and UP for ACTA2). Altogether, these results suggest that the actual upregulation of the ACTA2 protein should be reevaluated, also considering that this evidence is limited to one patient, in order to assess the possibility to correct its “polarity” (i.e., from UP to DOWN) or, possibly, to understand if these conflicts between gene sets and status in BNPTP depend on tissue-specificity and/or the peculiar role played by ACTA2 in these malignancies. Notably, with a change of the ACTA2 “polarity” the first three gene sets aforementioned would suggest that ACTA2 is a critical gene for BNPTP, whose levels go down or up in pro- or anti- proliferative conditions, respectively.

ALK

The anaplastic lymphoma kinase (ALK) is a receptor tyrosine kinase having a putative transmembrane domain and both an extracellular (N-terminal) and an intracellular (C-terminal) domain; the kinase activity of ALK resides completely in the intracellular portion of the protein [110]. Its misexpression, consequent to an amplified, mutated, truncated or rearranged protein, may lead to malignant transformation; mutations causing ALK kinase function hyperactivation, or putting the catalytic region under the control of another genetic promoter (as in chromosome translocations) deregulate the enzyme that, in turn, phosphorylates its targets in aberrant times and/or tissues [110]. One of the manuscripts listed in Table reports two pediatric/teen BC involving ALK rearrangements [36]. The Authors took advantage of two DNA probes (orange and green) encompassing the ALK locus, so that they are next to each other in normal cells; since in these two patients the green and orange staining were separated on two different chromosomes in cancer cells, they postulated an ALK rearrangement. Although the fusion partner is not described in this manuscript, in consideration of the nature of the sample (a cancer specimen) and of ALK behavior in known fusion proteins in inflammatory myofibroblastic tumors ([36] and references therein), we assumed that also in these patients there was an upregulation of the activity of the ALK kinase fragment. ALK belongs to the gene set RB_P130_DN.V1_UP [111] and its activity is UP in BNPTP. Since this gene set contains genes that are upregulated in the presence of a double knock-out for RB1 (OMIM ID: 614041) [112] and RBL2 (a.k.a. p130) (OMIM ID: 180203) [113, 114], any mechanism that interferes with the function of these two (putative) tumor suppressors, whose cellular roles are partially overlapped [115], has the potential to worsen this phenotype. ALK is also hit by CYCLIN_D1_KE_.V1_DN [116], a gene set based on the overexpression of the putative oncogene CCND1 (a.k.a. cyclin D1) (OMIM ID: 168461). While there is compatibility between gene set and gene biological context (i.e., they are both based on pro-proliferative conditions), the fact that ALK mRNA is downregulated in the gene set and the ALK protein activity is instead upregulated in BNPTP does not provide a direct link between these two molecular events.

CD34

The hematopoietic progenitor cell antigen, cluster of differentiation 34 (CD34) is a cell surface glycoprotein that functions as a cell-cell adhesion factor. It is a transmembrane sialomucin protein expressed on early hematopoietic and vascular-associated tissue [117]. Although its function is still elusive, data collected suggest that it may be involved in cell morphogenesis and migration, enhanced proliferation, and block of cell differentiation [117]. Interestingly, the up-regulation of the closely-related podocalyxin (member of the CD34 family of proteins) has been related to several malignancies, including breast cancer, prostate cancer, embryonic carcinomas, leukemia and pancreatic cancer [117]. For CD34 we have one of the most interesting cases of Table . Indeed, the gene set BRCA1_DN.V1_UP [118] hits this gene and this happens in pro-growth conditions for the experiment defining this gene set (silencing of BRCA1 (OMIM ID: 113705)). Therefore there are good chances or that CD34 is downstream of BRCA1 and the deregulation of this tumor suppressor is critical for the onset of BC in young patients or that the status of BRCA1 may be critical for the progression of this type of cancer, being able to further sustain the upregulation of this protein. Notably, a similar relationship can be established about the gene set CTIP_DN.V1_UP [118], which is based on the silencing of the tumor suppressor CtIP (a.k.a. RBBP8 or CTIP) (OMIM ID: 604124), whose association and interaction with BRCA1 has been described [119]. Overall, this analysis shows that there are two tumor suppressors, i.e., BRCA1 and CtIP, whose silencing induces the transcription of the CD34 gene and in pediatric/teen BC this protein is indeed up-regulated. These experimental evidences deserve further research, looking for cause-effect relationships between BRCA1/CTIP deregulation and upregulation of CD34 in BNPTP. Additionally, CD34 belongs to KRAS.AMP.LUNG_UP.V1_DN [120], a gene set based on the comparison between the hyper-expression of KRAS (a proto-oncogene; OMIM ID: 190070) that carries the G13V activating mutation and cells without this construct, i.e., in favorable conditions for cell proliferation. Since in the BC patient in which this alteration was found there was, as explained above, a protein up-regulation (Table ), this might be a case where two different mechanisms are involved in BNPTP and in other model systems; alternatively, this information could be relevant for defining how the levels of the CD34 protein are finely tuned, once defined the status of KRAS.

CDKN2A

Cyclin-dependent kinase inhibitor 2A (CDKN2A) is an oncosuppressor having at least three isoforms [121]. It is an inhibitor of cell cycle progression and acts at the G1/S transition by suppressing the action of crucial cyclin-dependent kinases, such as CDK4 and CDK6 [122]. This gene also contains another open reading frame coding for the ARF protein, which acts as an oncosuppressor as well [123]. The deletion or mutation of CDKN2A is frequently associated with neoplastic transformation in several tissues and organs, including the bladder [124], while it has been shown that ARF degradation is inhibited in cancer cells [125], suggesting that p16 and p19/ARF may act in complementary, inversely related cell cycle control pathways. Since oncogenic KRAS-driven cancers require TBK1, and TBK1 silencing induces KRAS-dependent apoptosis (OMIM ID: 604834) the hit of CDKN2A in the gene set TBK1.DN.48HRS_UP [120] suggests that the negative interference with this oncogenic pathway might positively induce CDKN2A, while the tumor promoting conditions found in BNPTP are able to down-regulate it. Therefore, CDKN2A shows potential to be a gene that switches its levels (UP or DOWN) in response to specific anti- or pro- tumor stimuli, respectively. CDKN2A is also found in the gene set SNF5_DN.V1_DN [126], since the knockout of the tumor suppressor SNF5 (OMIM ID: 601607) lowers its mRNA levels, in pro-proliferative conditions. Therefore, the dysregulation of SNF5 might be a worsening factor for the down-regulation of this protein in BNPTP (instead, we rule out the case that there is any causal relationship since the downregulation of CDKN2A is at the level of gene deletion (confirmed in 4 patients) and not of mRNA or protein). Considering that SNF5 and SMARCB1 are synonymous, this analysis can be combined with that of SMARCB1 (see below). Additionally, also CYCLIN_D1_KE_.V1_UP hits this gene [116]; this gene set is based on the overexpression of the putative oncogene CCND1 and therefore there is biological compatibility between what observed for the gene set at the mRNA level and these four patients at the protein level. This could be invoked as a control mechanism for the downregulation of CDKN2A, but the fact that using this target gene (i.e., CCND1) would require inducing a cyclin makes this information hardly applicable, in our opinion. Finally, this gene belongs to P53_DN.V1_UP [21]; this means that, in the pro-proliferative conditions that characterize this gene set, the CDKN2A mRNA is upregulated, while the CDKN2A gene is DOWN (being deleted) in BNPTP. These two events are clearly not correlated and do not need to be commented further.

Gli1 and Gli3

The GLI (glyoma associated oncogene) proteins are transcription factors. They are effectors of the Hedgehog (Hh) signaling pathway and have a role in cell fate determination, proliferation and patterning in many cell types and most organs during embryo development [127]. Their amplification causes neoplastic transformation in the central nervous system, and Northern blot analysis showed that GLI mRNAs are expressed in embryonal carcinoma cells but not in most adult tissues [128]. Gli1 is a recognized oncogene [129] and its over-expression in mice causes the formation of basal cell carcinoma (BCC) [130]. Instead, Gli3 is not related to glyoma, BCC or other forms of neoplasia, but is essential for Gli1 expression in the somites during the muscle formation [131]. Its mutations are associated with several other diseases, such as Greig cephalopolysyndactyly syndrome, Pallister-Hall syndrome, preaxial polydactyly type IV, and postaxial polydactyly types A1 and B [132]. Gli3 may both act as an activator or repressor of transcription [133]. GLI1 is one of the genes of the gene set WNT_ UP.V1_UP [134]: this is a typical case of a hyper-expressed oncogene (WNT1 (a.k.a. Wnt-1) (OMIM ID: 164820)) that induces the expression of a set of genes. Since in this gene set there are only genes up-regulated and GLI1 is up-regulated too in BNPTP, it is possible or that WNT1 is directly upstream of GLI1 or that its activation is capable to worsen the phenotype observed in the presence of elevated levels of GLI1. Another interesting case is present for the second hit of GLI1: the gene set CYCLIN_D1_KE_.V1_UP [116] is based on the overexpression of CCND1 and there is compatibility both at the level of biological context (pro-proliferative) and type of change (upregulation) between this gene set and this gene; additionally, the molecule involved (i.e., mRNA) is the same. This makes a relatively strong case for CCND1 being upstream and directly inducing the levels of GLI1 found in BNPTP, or, alternatively, this perturbation of cyclin D1 might worsen the phenotype of these patients. Finally, considering that the upregulation of WNT1 upregulates CCND1 (this has been described as an early biological event) [135], it is possible to hypothesize that the pathway WNT1CCND1 ↑ GLI1 ↑ is involved in BNPTP. Instead, only one gene set (CSR_LATE_UP.V1_DN) hits GLI3, but it is one of the 24 gene sets that we consider less usable for making this type of inferences. So, we cannot conclude anything relevant about possible upstream events for GLI3, based on the available oncological signatures.

HRAS, NRAS and KRAS

The small GTPase class of proteins called RAS (from the phrase “rat sarcoma”) includes three main members, i.e., HRAS, NRAS and KRAS. They are ubiquitously expressed in all human organs and their role is the intracellular signal transmission; they perform such a task by conformational changes induced by the hydrolysis of GTP into GDP. These changes are usually a response to an extracellular stimulus, passed through by specific receptors. RAS proteins transmit the signal by activating many biochemical cascades (such as Mitogen-activated protein kinases (MAPK)) and in this way control crucial cellular activities, such as cell proliferation, differentiation, and apoptosis [136, 137]. Mutations causing permanent activation of RAS polypeptides are oncogenic [138] and play a central role in bladder cancer formation [65]. Martinelli and collaborators showed that somatic missense mutations in RAS genes represent a recurrent event in pediatric/teenage embryonal RMS, accounting for approximately one fourth of the cases [40]. HRAS is involved in a wide number of normal cellular processes, and its upregulation may be sometimes physiological, as during rat liver regeneration [139]. Mutations in HRAS are also responsible of Costello syndrome, a multiple congenital anomaly and mental retardation syndrome [140]. KRAS has two isoforms called KRASA and KRASB, derived by an alternative splicing, differing in the C-terminal regions, which are important for post-translational modifications causing alternative trafficking pathways and protein localization [141]. Also deregulated KRAS is frequently involved in neoplastic transformation; in fact, 17 to 25% of all human tumors harbor an activating KRAS mutation [142]. Finally, also NRAS plays a central role in cell proliferation and its uncontrolled activation may lead to neoplastic transformation. Mutations in position 17 (in all RAS members) produce dominant-inhibitory proteins with higher affinities for exchange factors than normal RAS, impairing their functions [143]. NRAS mutations were identified in Noonan syndrome-6 [144] and in patients with congenital melanocytic nevus syndrome [145]. HRAS is hit by LTE2_UP.V1_UP [146], but this gene set has been discarded based on pre-defined algorithm steps that we used for gene set selection (see Materials and Methods); for this reason, this analysis is unable to provide more biological insight and the main information available is that this gene was found mutated in two BNPTP (Table ). NRAS belongs to the STK33_NOMO_UP gene set [106]; as mentioned in the ACTA2 paragraph, while the precise role of STK33 downstream of mutant KRAS is not fully clear [107-109], and while some caution should be used in making inferences based on AML cell lines, we consider reliable the information of Scholl et al. [106] about the fact that silencing STK33 has a strong anti-proliferative effect. This creates a biological conflict, since the UP/UP status for this gene set and NRAS (protein activity) is associated with anti- and pro- proliferative conditions in these two cellular environments, respectively; ultimately, also considering that the upregulation of the NRAS mRNA is dependent on two events (mutation of KRAS and STK33 knocked down) we consider this outcome inconclusive for NRAS. Instead, the situation of match between NRAS and the gene set YAP1_UP [147] is quite interesting: the over-expression of the YAP1 (OMIM ID: 606608) oncogene [148] induces NRAS, and this makes YAP1 a protein potentially capable to amplify NRAS mRNA levels and worsen the phenotype of these patients. Finally, no gene set hits KRAS, while it is upregulated in BNPTP. This outcome raises the odds that the mutation of this proto-oncogene is an early event of this cancer’s onset; therefore, the most important biochemical events concerning this gene are rather found downstream.

KRT20

Cytokeratin 20 is a type I cytokeratin encoded by the KRT20 gene. It is an integral intermediate filament component and a major cytoskeletal keratin of the intestinal epithelium. Its principal localization is in the intestinal and gastric mucosa, and in several other epithelia; indeed, it is also present in superficial (and, occasionally, intermediate) cells of the bladder urothelium (urothelial umbrella cells). As such, it can be used to identify a range of adenocarcinomas arising from epithelia, and by immunohistochemistry it is frequently found in colorectal cancer, transitional cell carcinomas and Merkel cell carcinoma [149]. In combination with CK7, it is a useful marker of bladder cancer [150]. We found a hit for this gene: in fact, a KRAS mutation downregulates KRT20 (gene set KRAS.600.LUNG. BREAST_UP.V1_DN [151]), whose activity is UP in our collection of clinical reports. The examined KRAS mutation is found in breast and lung malignancies, and, overall, these data point towards a tissue-specific KRT20 status.

MUC1

Mucin 1, cell surface associated (MUC1) is a transmembrane mucin (high molecular weight, heavily glycosylated protein) with the function of tissue protection from pathogen-mediated infections, but is also involved in signal transduction [152]; it is an oncoprotein. Being an epithelial protein, its expression is usually associated with carcinomas (of colon, breast, ovary, lung and pancreas), but it has also been found in mesenchymal tumors (such as synovial sarcoma and ovarian granulosa cell tumors) [153]. Some Authors hypothesize that its upregulation gives an advantage to cancer cells against the anti-tumor immune response [154]. Moreover, it has also been shown that the cytoplasmic portion of MUC1 may interact with p53, promoting the anti-apoptotic properties of the latter [155]. Apoptosis may also be impaired by the MUC1-mediated phosphorilation of Akt, causing the up-regulation of Bcl-2 and Bcl-xl that in turn prevent the release of the cytochrome c from the mitochondria [156]. Finally, the over-expression of MUC1 promotes the stabilization of beta-catenin, resulting in the initiation of EMT, which promotes invasiveness [157]. This gene belongs to a very interesting gene set, ERB2_UP.V1_UP [158], produced by collecting the genes that become upregulated after inducing the oncogene ERBB2 (OMIM ID: 164870); since the protein MUC1 is upregulated too, ERBB2 should be considered or as a possible upstream gene for MUC1 or potentially involved in a pathway that impacts the same gene in pro-proliferative conditions.

MYF5 and MyoD1

Myogenic factor 5 (MYF5) is a key protein in the regulation of muscle differentiation; MYF5 and MyoD1 (myogenic differentiation 1) are transcription factors belonging to the family of proteins known as myogenic regulatory factors (MRFs). Without the contemporary presence of Myf5 and MyoD, myogenic cells fail to progress normally during the determination stage of myogenesis [159]. MYF5 and MyoD1, and particularly the latter, are able to bind hundreds of muscular gene promoters and drive the myoblast proliferation; MyoD1 cooperates with the Retinoblastoma protein (Rb) to transcribe later markers of differentiation [160] and in inducing cell cycle arrest in terminally differentiated myoblasts, through the regulation of Cyclin D1 [161]. NOTCH_DN.V1_UP is a gene set created investigating the cell cycle deregulation induced by NOTCH [162], a gene having a rather elusive nature, which is context-dependent [163] and MYF5 belongs to it. Since it has been described that in BC NOTCH acts as a tumor suppressor [164] and our evaluations are referred, as much as possible, to the biological background of BNPTP, the following analysis is based on NOTCH intended as oncosuppressor. MYF5 is up-regulated both in the conditions tested for the definition of this gene set and in BNPTP, in the presence of what we assume would be a pro-growth stimulus on the side of the gene set and of a tumor background in BNPTP. Technically speaking, this might be a case where NOTCH signals upstream of MYF5, directly or indirectly. However, the considerable difference among these two biological models (indeed, this gene set was defined in T-cell acute lymphoblastic leukemia cell lines) and the contradictory nature of NOTCH highlighted above suggest caution about establishing this biochemical link. Instead, no inducer gene or mechanism can be hypothesized for explaining the downregulation of MyoD1, since no hit was found for it, which brings up the hypothesis that MyoD1 acts as a specific biomarker of pediatric/teen BC.

NCAM1

Neural cell adhesion molecule 1 (NCAM1) is a glycoprotein expressed on the surface of neurons, glia, skeletal muscle and natural killer cells. It has a role in cell-cell and cell-matrix adhesion [165], neurite outgrowth, synaptic plasticity, learning and memory [166]. NCAM shares many features with immunoglobulins and indeed is considered a member of the immunoglobulin superfamily [167]. Some reports link its expression to cancer, especially to neuroblastoma, malignant lymphomas of T-NK cell origin, multiple myeloma, melanoma, some cancers of epithelial origin [168], small cell lung cancer, neuroblastoma, rhabdomyosarcoma, brain tumors, acute myeloid leukemia [169] and, occasionally, large B-cell lymphoma [170, 171]. The NCAM1 protein is found up-regulated in one patient, but is downregulated in AKT_UP.V1_DN [172]. Since this gene set derives from the hyper-expression of Akt1, an oncogene (OMIM ID: 164730), we are in the presence of an antagonistic way to be upstream of NCAM1, a fairly good example of a gene which may play different roles in different tissues or contexts [173]. Additionally, the gene set KRAS.600.LUNG.BREAST_UP.V1_UP [151] is also hit by this gene; this suggests that oncogenic KRAS might be signaling towards NCAM1 or that it has the potential to worsen the observed phenotype. This result, while important, does not add new key genes to the BNPTP gene list, since KRAS is already among those of primary relevance for these patients. This gene also belongs to MTOR_UP.V1_DN [172], which means that with a pro-proliferative cell background (upregulation of MTOR; OMIM ID: 601231) NCAM1 is downregulated, while, in the pro-growth conditions of BNPTP, the same gene is upregulated. This outcome suggests, among other possibilities, that pediatric/teen BC possibly is an MTOR-independent tumor [174].

NF1

Neurofibromin 1 (NF1) is the gene responsible for neurofibromatosis type I (named NF1 as well), a tumor disorder affecting the nervous system. NF1 is a cytoplasmic protein predominantly expressed in neurons, Schwann cells, oligodendrocytes, and leukocytes. It plays an important role in cell proliferation pathways, such as the RAS-cyclic AMP pathway and the ERK/MAP kinase cascade, in adenylyl cyclase activation, and in cytoskeletal assembly; mutations are usually discovered in the heterozygous state [175]. In particular, its action on HRAS is inducing hydrolyzation of GTP, thus inactivating it; consequently, NF1 acts as a tumor suppressor [176]. Moreover, mutations in this gene have been identified in other clinical conditions, such as juvenile myelomonocytic leukemia [176], Watson syndrome [177], desmoplastic neurotropic melanoma (DNM) [178], glioblastoma [179], and breast cancer [180]. Notably, while the nature of NF1 can be hardly defined looking at the literature about BNPTP (Table ), the hit matrix based on OSMSD points in the direction of an upregulation: we suggest to investigate this topic further. Indeed, we found that: i) it belongs to P53_DN.V2_UP [181], which is a gene set where the tumor suppressor TP53 is silenced, thus bringing to the upregulation of a number of genes, among which there is NF1. For consistency with this result, the odds that the NF1 mRNA is upregulated are higher than for the opposite hypothesis; ii) TBK1.DF_DN [120] is also hit by NF1; for this gene set, the gene TBK1, upon which the proto-oncogene KRAS relies for cancer induction, is silenced, thus making the cell condition less favorable for growth. This would better match the case that NF1 is upregulated, which, in turn, would raise its chances to be a very sensitive gene, which reacts to pro- or anti- growth stimuli moving its levels in opposite directions. Instead, if this conflict was resolved assuming that NF1 is downregulated in BNPTP, this would be the case of a gene that is DOWN independently of the growth condition, and rather depending on the biological context; iii) NF1 belongs to WNT_UP.V1_UP [134]: therefore, a pro-proliferative experimental stimulus (due to the upregulation of WNT1) induces NF1 mRNA. If NF1 is actually upregulated, this means that WNT1 might be an upstream inducer or, at least, a worsening factor for BNPTP.

PTCH1

Patched 1 is a tumor suppressor, transmembrane protein and a receptor for the secreted molecule ‘sonic hedgehog’ that plays a role in the formation of embryonic structures and in tumorigenesis. PTCH1 acts as an inhibitor of the ‘smoothened’ protein, a G protein-coupled receptor [182]; when ‘sonic hedgehog’ binds PTCH1, smoothened is released and signals cell proliferation [183]. Interestingly, Gli1 regulates PTCH expression in a cell type-specific manner [184]. Mutations in PTCH1 cause Gorlin syndrome, basal cell carcinomas (BCC), nevoid basal cell carcinoma syndrome (NBCCS), medulloblastoma and rhabdomyosarcoma [185]. E2F3_UP.V1_UP [186] is a gene set containing genes up-regulated upon over-expression of E2F3 (OMIM ID: 600427) [187]; since in both cases (BNPTP and the experimental conditions tested for defining this gene set) we are in pro-cell growth conditions, we deem or that E2F3 is upstream of PTCH1, since its mRNA is also upregulated in BNPTP, or that E2F3 has the potential to worsen the phenotype of these patients.

PTPN11

Tyrosine-protein phosphatase non-receptor type 11 is an intracellular enzyme that is widely expressed in human tissues and is particularly abundant in heart, brain, and skeletal muscle. It is involved in mitogenic activation, metabolic control, transcription regulation, and cell migration. Dominant mutations of this gene can cause Noonan syndrome [188], LEOPARD syndrome [189], juvenile myelomonocytic leukemia [190], and metachondromatosis [191]. PTPN11 mutations, although at low frequency, are also found in several other human cancers [192]. These data suggest that PTPN11 is a proto-oncogene; however, a recent report shows that it may also act as a tumor suppressor, at least in hepatocellular carcinoma [193]. We found a match between a gene set, ATM_ DN.V1_DN [181], generated in pro-growth conditions (silencing of the tumor suppressor ATM (OMIM ID: 607585)), and PTPN1, which is included in the list because of an activating mutation, with a conflict DOWN/UP between gene set and gene. An analogous situation happens for the gene set P53_DN.V2_DN [181], where the silencing of TP53 down-regulates a number of genes, and among them PTPN11. Considering that ATM phosphorylates and activates TP53 [194], a general pathway based on these genes is the following: ATM ↓ TP53 ↓ PTPN11 ↓. If the pediatric/teen BC data (PTPN1 UP) were confirmed, this might suggest that PTPN11 activation in BNPTP is independent of this tumor suppressor signaling pathway. The possibility to regulate the upstream levels of either or both these tumor suppressors for counteracting the activation of PTPN11 looks hardly applicable in medicine, since it would involve “tampering” with tumor suppressors.

SMARCB1

SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily B member 1 (SMARCB1) is a chromatin remodeling factor. It is a subunit of the SWI/SNF ATP-dependent chromatin-remodeling complex whose function is relieving repressed chromatin state and promote gene expression. It is a tumor suppressor that is frequently deleted in pediatric/teen malignant rhabdoid tumors (MRT) [195]. Mutations in this gene were also identified in i) choroid plexus carcinomas, ii) a subset of central primitive neuroectodermal tumors and medulloblastomas [196], iii) schwannomatosis 1 and iv) meningiomas [197]. Its loss of function in MRT-derived cells causes polyploidy and chromosomal instability, a condition that can be reverted upon its re-expression [198]. Interestingly, it has been also demonstrated that SMARCB1 is able to drive the expression of Gli proteins, by acting as a negative regulator; thus, it influences the Gli-Hh signaling pathway (see above) [199]. In the presence of inactivation of the Snf5 tumor suppressor (OMIM ID: 601607), i.e., of the experimental conditions that allowed defining the gene set SNF5_DN.V1_DN [126], SMARCB1 is down-regulated; however, since SMARCB1 and Snf5 are just synonymous, it is expected that the mRNA levels of a knocked out gene drop. On the side of BNPTP, SMARCB1 is DOWN, at the level of gene deletions, and this gene alteration is quite important (5 cases are reported in Table ). Notably, out of 189 gene sets tested, only one, which is defined by experimentally targeting this gene, is able to lower the expression of SMARCB1 mRNA; this suggests that the oncogenicity of SMARCB1 does not depend on nor is modulated from cancer-related upstream events that are collected in the OSMSD.

TP53

Tumor protein p53 (TP53) is a tumor suppressor gene and the most frequently mutated gene (up to 50%) in human cancer [200]. Despite its name, derived by its molecular weight, it is now known that the TP53 gene may encode at least 12 different protein isoforms, ranging from 28 to 53 kDa [201]. TP53 encodes proteins that bind to DNA and regulate gene expression to prevent mutations of the genome [202]. It has been defined “guardian of the genome”, as it plays important roles in apoptosis, angiogenesis, genomic stability, miRNA processing, cell cycle control, aging, induced pluripotent stem cell generation; its protein isoforms are regulated by phosphorylation, ubiquitination, acetylation, methylation, through the miRNA targeting of its transcripts, and by the interaction with other proteins. This gene is only hit by the gene set P53_DN.V2_UP [181], where the silencing of TP53 induces the upregulation of the TP53 mRNA, possibly because the cells used try to compensate for this knockdown. However, the not fully understood reason for this gene set outcome, the fact that it is hard to make this type of “circular” inferences (the targeted gene in the gene set is also one of the differentially expressed genes), and the observation that the behavior of TP53 in BNPTP is somehow ambiguous (Table ), make this case, in our opinion, completely undefined.

VIM

Vimentin (VIM) is a type III intermediate filament whose main function is the localization and anchoring of the organelles in fixed positions inside the cytosol [203]; in the form of a secreted phosphorilated homodimer, it plays a role also in the immune response [204]. Typically, VIM is expressed in mesenchimal tissues, and for this reason it is frequently used as a specific marker for mesenchimally-derived tissues and for their transformed counterparts, i.e., sarcomas. VIM belongs to PTEN_DN.V1_DN [205], which means that when the tumor suppressor PTEN (OMIM ID: 601728) is silenced (pro-growth condition) VIM is downregulated. However, the VIM protein is actually upregulated in BNPTP, thus providing evidence for an upstream event that might reduce its level of activation, but in pro-growth conditions. Such data allow making several hypotheses: i) this upregulation might not be a key event in pediatric/teen BC, ii) there is a specificity in BNPTP with respect to this gene, iii) the upstream event (silencing of PTEN) is a possible control mechanism for the upregulation of VIM, but would require to be very finely tuned. Also WNT_UP.V1_DN [134], generated in pro-growth conditions (upregulation of WNT1) is hit by VIM. The discrepancy between the gene set (DOWN) and gene (UP) polarity allows making some hypotheses, ranging from the possibility to evaluate the use of this event as an upstream regulator of VIM, which is made complicated by the nature of the possible action (upregulation of an oncogene) to simply assuming that VIM has an independent behavior with respect to this tumor progression pathway. Finally, VIM hits P53_DN.V1_DN (a gene set based on mutant p53 and, as such, identified in samples that are in pro-proliferative conditions) [21]. However, in this gene set VIM is DOWN at the mRNA level, so we have another case of DOWN/UP conflict between gene set and gene; this outcome suggests that in pediatric/teen bladder cancers VIM is regulated in a p53-independent way.

WT1

Wilms tumor protein 1 (WT1) is a transcription factor and plays a central role in the normal development of the urogenital system. WT1 acts both as a tumor suppressor and an oncogene, according to the splice form involved. It causes an embryonic kidney malignancy, namely the Wilms’ tumor; some Authors proposed that this neoplastic transformation is mediated by abnormally persistent renal stem cells (nephrogenic rests), which retain embryonic differentiation potential [206]. This gene is only hit by gene sets (namely, ATF2_ S_UP.V1_DN, ESC_V6.5_UP_EARLY.V1_UP and LEF1_ UP.V1_UP) that we have decided not to include in this analysis, since they do not meet the quality standards defined for the gene set selection. Readers interested in learning more about possible upstream events for this gene, whose protein is upregulated in BNPTP, can especially refer to the last two, since they share the same polarity (i.e., UP) of WT1 in BNPTP.

DISCUSSION

There is no particular reason to hypothesize that rare diseases in general and, more specifically, rare types or subtypes of cancer, have a different level of complexity when compared with more common diseases [207, 208]. Intriguingly, personalized medicine is rather showing that even common pathologies (e.g. cancers with the highest morbidity) are a collection of rare genomic diseases [209]. However, there is a clear and remarkable difference between common and rare diseases: the former are usually the subjects of strong research efforts, which generate abundant data and useful knowledge, while investigations performed and financial resources invested for the latter are much less, sometimes negligible. Data available for infrequent pathologies are sparse and often based on single cases, which are treated by physicians living in different countries and, as such, are hardly seen as a whole [210, 211]. Additionally, the level of knowledge transmission is generally insufficient [212] and many physicians and researchers that directly treat or investigate these cases lack the training for properly performing multi-step data analyses similar to those presented in this paper or deal with “knowledge barriers”, which may have many causes [213]. These facts make much more challenging to find effective strategies for fighting rare diseases, in particular when it comes to the contextualization (i.e., the enhancement of our understanding of a biological object X, taking into consideration known biological factors, conditions, and mechanisms that are in some relationship with X) of the gene alterations found in individual patients. Therefore, every computational or methodological approach that rigorously connects and leverages on the existing information is a welcome addition to the literature. The goal of this paper is dual: on the one side to understand the aggregate properties of genes that have been described as altered in BNPTP, using GO-based methods, and on the other side i) to suggest a possible etiology for the gene alterations found or ii) to determine oncological mechanisms that regulate/deregulate/act upstream of the same genes. The shortage of data available made this goal hard to achieve for several reasons. First of all, in most cases only one patient is linked to a specific gene deregulation, thus the data collected so far would greatly benefit of additional reports, possibly obtained using similar methods, in order to confirm these gene roles. Moreover, the classification of these tumor samples is not always straightforward; for example, the gene SMARCB1 was described both in pure and malignant rhabdoid tumors, but the Authors of these papers (see Table ) do not specify if these two tumors should be regarded as two different subtypes or not. In general, classification issues may help to explain why i) the same tumor is or is not positive for the same gene, and ii) different research groups characterized the same tumor based on different markers (see, for example, [15] and [45]). The identified genes are heterogeneous not only because there are biological differences among distinct tumor types/subtypes, but also because the techniques used by different research groups are different and methods have been dramatically improving (for example, the research of Stratton et al. [38] was published when DNA microarrays did not exist). We decided to pursue a GO approach for the following reasons: 1) GO-based methods rely on categorical data analysis and this allows statistically assessing which biological themes are relevant despite the level of gene heterogeneity; 2) we were looking for biological processes that could be identified without any bias due to the analyst’s knowledge about this subject; 3) the article published by Hoadley et al. about the Pan-Cancer-12 collection [214] shows that the organ involved in a malignancy often determines a distinct genomics signature, and we were interested in understanding which biological factors join the tumors listed in Table . Notably, looking at the 21 genes of BNPTP independently shows a strong predominance of proliferative genes; however, this approach fails to detect a number of biological processes, which are associated with statistically significant GO-BP p-values and may be involved as well in this cancer onset. Since we are aware that some of these p-values are overly optimistic and may be driven by the limited number (i.e., 21) of genes available, we used a semantic selection process, which reduced the most important GO-BPs to 26. Looking at the semantic network of these 26 GO categories, we found that the interplay between genes involved in differentiation/development and regulatory pathways may be crucial: in particular, the role of Rac signal transduction, of the cytoskeleton and of response mechanisms to abiotic stimuli deserve to be investigated further. The fact that the analyzed genes are ontologically grouped into two relatively coherent subgraphs suggests that each of them may represent a main route of cancer development in these young patients. Focusing on the left subgraph of Fig. , there are no clear clues about why most of its GO terms showed up. However, the specific nature and behavior of pediatric/teen BC [2] might partially explain this outcome. In particular, it is tempting to envisage that alterations during the urogenital development may influence the normal homeostasis of bladder cells in young patients, ultimately leading to the neoplastic transformation. Indeed, there seems to be a specific age threshold dividing pediatric/teenage and adult bladder patients, set at around 19 years of age [2, 17]. Therefore, a reasonable hypothesis is that the transformation relies on the left subgraph of GO-BP terms, while the GO categories of the right subgraph are more related to abnormal cell growth and cancer development. As in every similar analysis, the GO categories found in this network can be better understood looking at the biological features of the genes that belong to each GO term (see Tables 3 and S1). Indeed, in an extremely simplified way, the 21 genes found in BNPTP can be grouped into four classes, based on their potential activity: i) CD34, WT1, ACTA2, and VIM may play heterogeneous roles with respect to cellular differentiation; ii) Gli1, Gli3, PTCH1, SMARCB1, MYF5, and MyoD1 may specifically affect muscle (one of the two main tissues of the bladder) cell differentiation; iii) KRT20, Muc1, and NCAM1 may specifically affect the epithelial (the other main tissue of this organ) differentiation; iv) ALK, CDKN2A, RAS proteins, NF1, PTPN11, and TP53 may drive and/or contribute to this carcinogenesis with complex and multifaceted modalities. To better understand the divide between pre-adults and adults, genes involved in adult bladder cancer should be analyzed with GO algorithms identical or equivalent to those used in this paper, looking for differences and similarities in the GO terms retrieved. As for the gene set analysis, we have been exploiting the knowledge produced by GSEA and related methods and the information stored in the Molecular Signatures Database because: a) GSEA, PAGE, GLAPA, ASSESS and similar methods show that it is statistically beneficial aggregating tens or hundreds of genes and analyzing them as a whole, for robustness [21-24]; b) results previously obtained applying gene set-based methods prove that it is effective and useful making inferences on biomedical samples through statistical tests where the hypotheses that sets of genes act as biological “cliques” are based on experiments performed on heterogeneous biological models, both in vitro and in vivo; c) gene set information has a precise directionality and polarity (an experimental stimulus induces a set of genes and represses another set of genes) and all the genes belonging to a gene set are “weighted” in the same way; d) the presence of a gene in a gene set is associated with a probability (calculated as (# genes in the gene set) / (# genes in the genome)) that, even in the most unfavorable case (for NFE2L2.V2, which is the largest gene set of OSMSD), is well below 5% (i.e., p-value < 0.05) and usually is (much) lower; e) the Molecular Signatures Database collects and annotates gene sets of oncological relevance and makes these data publicly available for analyses and meta-analyses [21]. The hit matrix of BNPTP has many types of information, some of good and some of insufficient biological quality (evaluated in terms of compatibility between type of gene set experiment and biological status of these patients), as it is expected by a broad-spectrum analysis like this. However, the well-defined 34 cases that we comment in this paper as hits supported by some type of biological evidence (either concordant or discordant, in terms of compatibility between pro- and anti- growth background, with what found in altered genes of BNPTP), allowed us to formulate novel hypotheses about the involvement of some (proto)oncogenes and (putative) tumor suppressors; hopefully, this list of candidate genes will stimulate more biological and clinical research on bladder cancer in pre-adult patients. The case of ACTA2 is quite compelling, since its status in BNPTP is in disagreement with all the indications provided by our gene set analysis; therefore, this is the case of a meta-analysis that hypothesizes the redefinition of a gene role. Redefinitions and reassessments are not uncommon outcomes of meta-analyses; for instance, the aforementioned paper by Hoadley et al. allowed reevaluating the subtype definition of breast cancer samples, showing that luminal and HER2 (i.e., ERBB2/HER2) subtypes can effectively be joined into a unique group, from a genomics standpoint [214].

CONCLUSION

In adults, more than 50% of muscle invasive BC (MIBC) samples harbor mutations in the gene TP53, which is the major gene player of BC [28]. Instead, our PubMed searches found only one patient out of 25 examined (i.e., 4%) with an involvement of TP53 (Table ) and the results concerning this gene obtained through our computational analyses are inconclusive/undefined. The case of this 18 year old male is also quite ambiguous, since he had two alterations with apparently opposite effects (namely, an early stop codon with, likely, a loss of function, and an overexpression, probably dependent on the other allele (see the Results section)). Remarkably, TP53 may be either inactivated or upregulated in adult BC samples [28], yet this usually does not happen in the same patient [28]. Additionally, the oncogene FGFR3 (OMIM ID: 134934), which is the second most mutated gene in adult BC (found upregulated in up to 80% of non-MIBC and 40% of MIBC [28]) to the best of our knowledge has never been found mutated in BNPTP, even when mutations of this gene were specifically looked for [17]. Consequently, the two main genes involved in BC formation in adults seem to be less important in BNPTP. Notably, our results allowed identifying two genes potentially involved in BC of pediatric and teen patients, namely CTIP and WNT1, which are not established BC markers in adults. More specifically, a PubMed search using “CTIP bladder” does not find any relevant paper (as to mid-June, 2015), while a search based on the same phrase with common web search engines allows retrieving only three articles: i) one analyzing CTIP variants in MIBC, mostly in the framework of DNA damage signaling and repair, which did not find any significant association between carriage of the called variants and overall survival [215], ii) one showing the cell cycle-specific expression of CTIP and its interaction with BRCA1 in a BC cell line [216], which is consistent with our analyses about CD34 (see the paragraph about this gene in the Results), and iii) one false-positive result due to the fact that the Authors (working on NIH 3T3 cells) were using as a reference the previous paper [217]. A similar PubMed search was performed for WNT1. In this second case we were able to retrieve two papers: a) one investigating the relationship between WNT1 and BC, but in an indirect and purely associative way (involving the gene TERE1, whose locus is indeed related to BC) [218], and b) one even more indirect discussing Wnt and Fgf genes and reporting the association between a cluster of Fgf genes and BC [219]. Based on these query outcomes, the involvement of CTIP and WNT1 in the signaling pathways of BNPTP, if confirmed experimentally, would be a novelty in the genetic landscape of urothelium transformation. Our analyses allowed identifying seven other genes potentially involved in BNPTP and whose presence was somehow more expected, since their role in adult BC is supported by the literature. These genes are: 1) ERBB2 [220, 221], 2) CCND1 [222], 3) YAP1 [223, 224], 4) BRCA1 [225, 226], 5) RB1 [227, 228], 6) RBL2/p130 [229, 230], and 7) E2F3 [231]. An analysis of the expression patterns and mechanisms of these seven genes in adults compared (as much as possible) with BNPTP is omitted, because it is beyond the scope of this article. Altogether, our results and the literature suggest that some BC genes are likely active in an age-dependent fashion while others are shared between pre-adults and adults; this would explain the observed differences in the formation, development and behavior of BC in these two broad age groups [2]. Some of our results may be especially divergent from the outcomes of adults also because children have BC types/subtypes that are extremely rare in adults (for example, RMS or rhabdoid tumors) [232]. It is our belief that contextualization methods are among the most powerful tools for making gene therapy, personalized medicine and advanced medical approaches [233-235] available also to BNPTP and, more in general, to patients with rare diseases. We anticipate that some hypotheses described in this article may not be confirmed experimentally; however, several mechanisms and genes that are listed in the Results section are novel and warrant future research for precisely defining their role in BNPTP. Additionally, the set of supplementary data and the key steps for using them (see the Materials and Methods section) allow anyone who is interested in this subject to formulate more hypotheses, possibly expanding the grid of analyzed cases beyond what we have already done. In the light of the recent production of genomics data about BC in adults [236] and of the differential BC responsiveness to therapy based on genomics subtypes [237], it would be very valuable to connect and compare as much as possible pediatric/teen and adult cases extensively and at a molecular level. It is our intention to continue investigating these bladder malignancies using ad hoc computational methods, which aim to fill the gap of knowledge between pre-adults and adults and to propose computationally-derived therapeutic strategies. Supplementary material is available on the publishers Web site along with the published article.
Table 1.

Reported pediatric and teenage patients with malignant bladder masses.

Bladder Cancer Type/subtypeNumber of Cases Retrieved from the LiteratureIncidence On Total Bladder Cancers in Pediatric and Teenage PatientsReferences
rhabdomyosarcomasome thousands20% of all rhabdomyosarcomas, which are 4-8% of all malignant pediatric tumors[3,4]
transitional cell carcinomaca. 1500.4%[5-7]
leiomyosarcomaa few tens0.1%[8]
urachal adenocarcinomaa few tens0.17%-0.34%[4,9]
adenocarcinoma of the exstrophied bladderless than 100< 1/50,000 newborn with exstrophied bladder[10]
inflammatory myofibroblastic tumor36< 0.1%[11]
mesonephric and clear cell adenocarcinoma19< 0.03%[12]
perivascular epithelioid cell neoplasm1extremely rare[13]
paraganglioma/ pheochromocitoma102/106cases per year[14]
pure malignant rhabdoid tumor8extremely rare[15]

Column 1: bladder cancer type/subtype;

column 2: number of cases described in the literature;

column 3: incidence on total pediatric/teen bladder cases;

column 4: references. Data were retrieved from [2] and integrated with the most recent bibliography available. Listed references refer to the primary or most relevant sources used for generating this table.

Table 2.

Cases of pediatric/teen BC retrieved through PubMed.

Gene (a)Gene Name in Molecular Signatures DatabaseMolecular FunctionCellular FunctionNeoplasmAlterationAdditional InformationAge (b)SexRef.OMIM ID
p53TP53transcription factoroncosuppressorpTa low grade (c)mut/overCAA->TAA stop codon at position 136 (exon 5); overexpression evaluated by immunohistochemistry18male17191170
p16/lnk4CDKN2Acdk inhibitoroncosuppressorpTa low gradedelFISH (UroVysion)14male17600160
p16/lnk4CDKN2Acdk inhibitoroncosuppressorpTa low gradedelFISH (UroVysion)10male17600160
p16/lnk4CDKN2Acdk inhibitoroncosuppressorpTa high gradedelFISH (UroVysion)17male17600160
p16/lnk4CDKN2Acdk inhibitoroncosuppressorPUNLMPdelFISH (UroVysion); presence of aneuploidy for portions of chromosome 818male17600160
CK20KRT20intermediate filamentcytoskeletonpTa low grade (c)overimmunohystochemistry18male17608218
CK20KRT20intermediate filamentcytoskeletonpTa low gradeoverimmunohystochemistry18male17608218
Ptch1PTCH1receptor of shhembryo development, oncosuppressorRMSoverAffymetrix gene expression profilen/an/a31,32601309
Gli1GLI1transcription factordifferentiationRMSoverAffymetrix gene expression profilen/an/a31,32165220
Gli3GLI3transcription factorembryo developmentRMSoverAffymetrix gene expression profilen/an/a31,32165240
Myf5MYF5transcription factormuscle differentiationRMSoverAffymetrix gene expression profilen/an/a31,32159990
MyoD1MYOD1transcription factormuscle differentiationRMSunderAffymetrix gene expression profilen/an/a31,32159970
NF1NF1negative regulator of rasoncosuppressorRMSdellarge deletion of the whole gene on one chromosome, evaluated by microsatellite markers; other allele apparently normal (no nucleotide sequencing available)1male34613113
NF1NF1negative regulator of rasoncosuppressorRMSundneurofibromatosis1male35613113
ALKALKreceptor tyrosine kinaseCNS developmentIMTarrimmunohistochemistry14male36105590
Gene (a)Gene Name in Molecular Signatures DatabaseMolecular FunctionCellular FunctionNeoplasmAlterationAdditional InformationAge (b)SexRef.OMIM ID
ALKALKreceptor tyrosine kinaseCNS developmentIMTarrimmunohistochemistry5female36105590
H-RASHRASGTPaseoncogenecarcinomaundCostello syndrome patient12male37190020
K-RASKRASGTPaseoncogeneRMSmutK13Asp4female38190070
H-RASHRASGTPaseoncogenetransitional cell carcinomaundn/a, gene inferred by the patient being affected by Costello syndrome13female39190020
N-RASNRASGTPaseoncogeneRMSmutCAA->AAA in exon 2 causing Q61Kn/an/a40164790
N-RASNRASGTPaseoncogeneRMSmutCAA->? in exon 2 causing Q61 change (unspecified)n/an/a40164790
PTPN11PTPN11tyrosine-phosphatasemitogenic activationRMSmutGAG->AAG in exon 3 causing E69Kn/an/a40176876
K-RASKRASGTPaseoncogeneurachal adenocarcinomamutG12S18n/a41190070
SMARCB1/INI1SMARCB1chromatin structure regulatorgene activationmalignant rhabdoid tumordelimmunohystochemistry; confirmed by multipllex ligation probe amplification; large deletion of the locus3male42601607
SMARCB1/INI1SMARCB1chromatin structure regulatorgene activationmalignant rhabdoid tumormut750insC on one allele; Del exon6 on the other allele6 mn/a43601607
SMARCB1/INI1SMARCB1chromatin structure regulatorgene activationmalignant rhabdoid tumormuthomozygous deletion of exon 60 mn/a43601607
SMARCB1/INI1SMARCB1chromatin structure regulatorgene activationmalignant rhabdoid tumormicro-delc.20_43delinsT in one allele; deletion of the other allele5 mfemale44601607
SMARCB1/INI1SMARCB1chromatin structure regulatorgene activationpure rhabdoid tumorlack of immunohistochemical stainingimmunohistochemistry17 (d)female15601607
CD34CD34cell-cell adhesion factorcell proliferationpure rhabdoid tumoroverimmunohistochemistry17 (d)female15142230
CD56NCAM1cell-cell adhesion factorcurrently unclearpure rhabdoid tumoroverimmunohistochemistry17 (d)female15116930
WT1WT1transcription factordevelopment of the urogenital systempure rhabdoid tumoroverimmunohistochemistry17 (d)female15607102
Gene (a)Gene Name in Molecular Signatures DatabaseMolecular FunctionCellular FunctionNeoplasmAlterationAdditional InformationAge (b)SexRef.OMIM ID
VIMVIMintermediate filamentcytoskeletonpure rhabdoid tumoroverimmunohistochemistry4 (e)female45193060
ACTA2ACTA2actincytokinesis, cell movementpure rhabdoid tumoroverimmunohistochemistry4 (e)female45102620
MUC1MUC1mucincell signaling and protectionpure rhabdoid tumoroverimmunohistochemistry4 (e)female45158340

Column 1: gene names as reported in the manuscripts, which are listed in column 10;

column 2: gene names according to the Molecular Signatures Database, which was used for performing the gene set-based analyses;

column 3: main molecular function of the protein encoded by that gene;

column 4: most relevant cellular function of this protein;

column 5: neoplasm affecting the patient(s);

column 6: molecular alteration, which is reported by or deduced from the bibliographic reference(s);

column 7: in case of gene sequencing, the mutation is reported; in case of protein function analysis or mRNA quantification, the method used is reported;

column 8: patient age; unspecified cases are patients that are surely under 19 years old, but whose exact age is unknown;

column 9: patient sex; n/a means that this information is not available;

column 10: reference(s);

column 11: gene identification number inside the OMIM database (URL: www.ncbi.nlm.nih.gov/omim). Abbreviations: FISH – Fluorescent In Situ Hybridization; PUNLMP – Papillary Urothelial Neoplasm of Low Malignant Potential; CNS – Central Nervous System; RMS – rhabdomyosarcoma; IMT – inflammatory miofobroblastic tumor; mut – mutation, point mutation; over – overexpression; under – underexpression; del – deletion; arr – rearrangement; und – undefined mutation. Notes: (a) duplicate lines indicate different patients with mutations in the same gene; (b) age is expressed in years, unless where differently specified with an “m” next to the number, indicating “months”; (c) the same patient has both mutations; (d) all alterations belong to the same patient; (e) all alterations are referred to the same patient.

Table 3.

GO-BP terms selected by DAVID and having a p-value < 0.01.

GO-TermCountp-value
GO:0007569~cell aging51.38E-07
GO:0008542~visual learning47.90E-06
GO:0043523~regulation of neuron apoptosis58.18E-06
GO:0042127~regulation of cell proliferation98.52E-06
GO:0007632~visual behavior41.21E-05
GO:0035022~positive regulation of Rac protein signal transduction31.24E-05
GO:0007568~aging51.81E-05
GO:0045935~positive regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process81.97E-05
GO:0051173~positive regulation of nitrogen compound metabolic process82.41E-05
GO:0010557~positive regulation of macromolecule biosynthetic process82.67E-05
GO:0031328~positive regulation of cellular biosynthetic process83.59E-05
GO:0009891~positive regulation of biosynthetic process83.95E-05
GO:0009416~response to light stimulus54.42E-05
GO:0046822~regulation of nucleocytoplasmic transport48.10E-05
GO:0007612~learning48.96E-05
GO:0045941~positive regulation of transcription71.20E-04
GO:0035020~regulation of Rac protein signal transduction31.36E-04
GO:0010628~positive regulation of gene expression71.41E-04
GO:0032386~regulation of intracellular transport41.48E-04
GO:0010604~positive regulation of macromolecule metabolic process81.49E-04
GO:0009628~response to abiotic stimulus61.60E-04
GO:0045944~positive regulation of transcription from RNA polymerase II promoter61.67E-04
GO:0009314~response to radiation51.87E-04
GO:0046579~positive regulation of Ras protein signal transduction32.46E-04
GO:0051057~positive regulation of small GTPase mediated signal transduction32.79E-04
GO:0032228~regulation of synaptic transmission, GABAergic32.79E-04
GO:0051146~striated muscle cell differentiation42.80E-04
GO:0007224~smoothened signaling pathway33.50E-04
GO:0060341~regulation of cellular localization54.24E-04
GO:0048169~regulation of long-term neuronal synaptic plasticity34.29E-04
GO:0007265~Ras protein signal transduction44.71E-04
GO:0045893~positive regulation of transcription, DNA-dependent65.33E-04
GO:0007611~learning or memory45.54E-04
GO:0051254~positive regulation of RNA metabolic process65.54E-04
GO:0051223~regulation of protein transport45.98E-04
GO:0070201~regulation of establishment of protein localization47.12E-04
GO-TermCountp-value
GO:0042692~muscle cell differentiation47.12E-04
GO:0001952~regulation of cell-matrix adhesion37.13E-04
GO:0009967~positive regulation of signal transduction58.15E-04
GO:0042981~regulation of apoptosis78.16E-04
GO:0043067~regulation of programmed cell death78.60E-04
GO:0010941~regulation of cell death78.76E-04
GO:0048598~embryonic morphogenesis59.45E-04
GO:0048732~gland development49.78E-04
GO:0032880~regulation of protein localization40.001042703
GO:0010647~positive regulation of cell communication50.001222605
GO:0048168~regulation of neuronal synaptic plasticity30.001269391
GO:0044093~positive regulation of molecular function60.001351732
GO:0040008~regulation of growth50.001395832
GO:0006915~apoptosis60.001524138
GO:0012501~programmed cell death60.001628036
GO:0008285~negative regulation of cell proliferation50.001721827
GO:0001656~metanephros development30.001892643
GO:0010810~regulation of cell-substrate adhesion30.002067039
GO:0043524~negative regulation of neuron apoptosis30.002535107
GO:0033157~regulation of intracellular protein transport30.002735058
GO:0003002~regionalization40.002888256
GO:0008219~cell death60.003325104
GO:0007010~cytoskeleton organization50.003420399
GO:0016265~death60.003426397
GO:0046578~regulation of Ras protein signal transduction40.003459311
GO:0006357~regulation of transcription from RNA polymerase II promoter60.003488245
GO:0009953~dorsal/ventral pattern formation30.003491407
GO:0006275~regulation of DNA replication30.003723492
GO:0048167~regulation of synaptic plasticity30.003962608
GO:0030036~actin cytoskeleton organization40.004251627
GO:0030029~actin filament-based process40.005087539
GO:0051056~regulation of small GTPase mediated signal transduction40.005759671
GO:0043085~positive regulation of catalytic activity50.006401214
GO:0007389~pattern specification process40.006759105
GO:0000075~cell cycle checkpoint30.007859257
GO:0001822~kidney development30.00871315
GO-TermCountp-value
GO:0007406~negative regulation of neuroblast proliferation20.008839397
GO:0007264~small GTPase mediated signal transduction40.009734957
GO:0051090~regulation of transcription factor activity30.009975498

Column 1: 75 GO-Terms (GO code and GO category name);

column 2: number of genes of BNPTP that are found inside that GO category;

column 3: p-value.

Table 4.

mmary of the annotation of the 21 BNPTP genes and “hit matrix” based on the OSMSD.

  219 in total

1.  Detection of point mutations in N-ras and K-ras genes of human embryonal rhabdomyosarcomas using oligonucleotide probes and the polymerase chain reaction.

Authors:  M R Stratton; C Fisher; B A Gusterson; C S Cooper
Journal:  Cancer Res       Date:  1989-11-15       Impact factor: 12.701

Review 2.  Neurofibromin: a general outlook.

Authors:  A B Trovó-Marqui; E H Tajara
Journal:  Clin Genet       Date:  2006-07       Impact factor: 4.438

Review 3.  The genetic complexity of common cancers and the promise of personalized medicine: is there any hope?

Authors:  Monica Arnedos; Philippe Vielh; Jean-Charles Soria; Fabrice Andre
Journal:  J Pathol       Date:  2014-01       Impact factor: 7.996

4.  Neural cell adhesion molecule: structure, immunoglobulin-like domains, cell surface modulation, and alternative RNA splicing.

Authors:  B A Cunningham; J J Hemperly; B A Murray; E A Prediger; R Brackenbury; G M Edelman
Journal:  Science       Date:  1987-05-15       Impact factor: 47.728

5.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

Review 6.  The cell biology of Smo signalling and its relationships with GPCRs.

Authors:  Ana Ruiz-Gómez; Cristina Molnar; Helena Holguín; Federico Mayor; Jose F de Celis
Journal:  Biochim Biophys Acta       Date:  2006-09-30

7.  The discovery of putative urine markers for the specific detection of prostate tumor by integrative mining of public genomic profiles.

Authors:  Min Chen; Kai Wang; Liang Zhang; Cheng Li; Yongliang Yang
Journal:  PLoS One       Date:  2011-12-16       Impact factor: 3.240

8.  Targeting of KRAS mutant tumors by HSP90 inhibitors involves degradation of STK33.

Authors:  Ninel Azoitei; Christopher M Hoffmann; Jana M Ellegast; Claudia R Ball; Kerstin Obermayer; Ulrike Gößele; Britta Koch; Katrin Faber; Felicitas Genze; Mark Schrader; Hans A Kestler; Hartmut Döhner; Gabriela Chiosis; Hanno Glimm; Stefan Fröhling; Claudia Scholl
Journal:  J Exp Med       Date:  2012-03-26       Impact factor: 14.307

9.  p53 regulates myogenesis by triggering the differentiation activity of pRb.

Authors:  A Porrello; M A Cerone; S Coen; A Gurtner; G Fontemaggi; L Cimino; G Piaggio; A Sacchi; S Soddu
Journal:  J Cell Biol       Date:  2000-12-11       Impact factor: 10.539

10.  A comprehensive survey of Ras mutations in cancer.

Authors:  Ian A Prior; Paul D Lewis; Carla Mattos
Journal:  Cancer Res       Date:  2012-05-15       Impact factor: 12.701

View more
  1 in total

1.  Building the drug-GO function network to screen significant candidate drugs for myasthenia gravis.

Authors:  Shuang Li; Yuze Cao; Lei Li; Huixue Zhang; Xiaoyu Lu; Chunrui Bo; Xiaotong Kong; Zhaojun Liu; Lixia Chen; Peifang Liu; Yang Jiao; Jianjian Wang; Shangwei Ning; Lihua Wang
Journal:  PLoS One       Date:  2019-04-04       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.