Literature DB >> 32066756

Machine Learning and Bioinformatics Models to Identify Pathways that Mediate Influences of Welding Fumes on Cancer Progression.

Humayan Kabir Rana1, Mst Rashida Akhtar2, M Babul Islam3, Mohammad Boshir Ahmed4, Pietro Lió5, Fazlul Huq6, Julian M W Quinn7, Mohammad Ali Moni8,9.   

Abstract

Welding generates and releases fumes that are hazardous to human health. Welding fumes (WFs) are a complex mix of metallic oxides, fluorides and silicates that can cause or exacerbate health problems in exposed individuals. In particular, WF inhalation over an extended period carries an increased risk of cancer, but how WFs may influence cancer behaviour or growth is unclear. To address this issue we employed a quantitative analytical framework to identify the gene expression effects of WFs that may affect the subsequent behaviour of the cancers. We examined datasets of transcript analyses made using microarray studies of WF-exposed tissues and of cancers, including datasets from colorectal cancer (CC), prostate cancer (PC), lung cancer (LC) and gastric cancer (GC). We constructed gene-disease association networks, identified signaling and ontological pathways, clustered protein-protein interaction network using multilayer network topology, and analyzed survival function of the significant genes using Cox proportional hazards (Cox PH) model and product-limit (PL) estimator. We observed that WF exposure causes altered expression of many genes (36, 13, 25 and 17 respectively) whose expression are also altered in CC, PC, LC and GC. Gene-disease association networks, signaling and ontological pathways, protein-protein interaction network, and survival functions of the significant genes suggest ways that WFs may influence the progression of CC, PC, LC and GC. This quantitative analytical framework has identified potentially novel mechanisms by which tissue WF exposure may lead to gene expression changes in tissue gene expression that affect cancer behaviour and, thus, cancer progression, growth or establishment.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32066756      PMCID: PMC7026442          DOI: 10.1038/s41598-020-57916-9

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Welding processes join rigid material pieces (usually metal) at their contact interface by using high temperatures to cause fusion. This process can be hazardous because it exposes the operator to extremely toxic fumes and to radiant energy[1]. The International Agency for Research on Cancer (IARC) has recognized WFs and UV radiation from welding as Group 1 carcinogens[2]. WFs are mainly composed of metallic oxides, silicates and fluorides, including those of magnesium, manganese, zinc, aluminum, beryllium, copper, chromium, cadmium, lead, iron, nickel and vanadium[3]. Welders inhaling WFs in large quantities over a long period run a significantly elevated risk of developing certain types of cancer[1,2]. These metastatic diseases involve uncontrolled or neoplastic growth of cancer cells that arise after the accumulation of genomic mutations, but other factors with powerful effects on cancer behaviour and growth include genetic factors and environmental factors the suffer is exposed to[4]. Environmental factors include inhaled toxic fumes that affect the lungs and enter the circulation to reach many tissues, and which can affect cellular gene expression of cancer cells and thereby their behaviour, survival, growth and invasiveness. Thus, influences such as WF inhalation affects the progression of many types of cancers, including those focused on in this study, specifically CC, PC, LC and GC, which are among the cancers most commonly linked with WF exposure[5-7]. The aim of this study is therefore to identify mechanisms through which WFs may increase cancer incidence. LC is one of the most lethal types of cancer and globally is a leading cause of death[1,2,8]. WFs contain toxic metallic oxides and silicates that directly affect the sensitive tissues of the lung when inhaled, the manner of exposure (by inhalation) makes this the cancer with the highest risk for welders[9]. CC arises in the colon and the rectum and has a typical 5-year survival rate of about 60%. It damages colon or rectum by uncontrollable and invasive cell growth[10]. Iron, aluminum and magnesium oxide of the welding fumes are known to affect the incidence of CC[9], although this is not well understood. PC affects prostate, the gland which produces seminal fluid and controls the transportation of sperm[11]. Nitrogen oxides, carbon dioxide and phosgene are risk factors for prostate neoplasms that are found in WFs[9]. GC (gastric or stomach cancer)[12] is linked to exposures to nickel, beryllium and cobalt oxides which are all present in WFs[9]. In this study, we developed a systematic and quantitative network-based approach to investigate the effects of WFs on gene expression and how these effects may give a clue as to how they encourage the incidence and progression of cancers through affecting pathways and pathway genes that are also altered in these cancers. Thus, we compared gene expression effects of WF exposure with the altered pattern of gene expression seen in CC, PC, LC and GC. This involved, firstly, analyzing differentially expressed gene profiles, then filtering these genes through gene-disease association networks, signaling and ontological pathways, and protein-protein interaction networks. We also investigated the importance of genes and pathways thus identified by using the gold benchmark databases dbGaP and OMIM to identify evidence to support the involvement of these genes in pathological processes such as cancer development. Moreover, we analysed patient survival and its association with the genes that are dysregulated in both the WF-exposed tissue and the four types of cancers. The influence on cancer patient survival of these identified genes provides evidence for their involvement in WF effect on cancer progression.

Methods and Materials

Overview of the analytical approach

We applied an analytical approach to identify links between WF exposure and the incidence of the cancers by employing selected microarray datasets shown in the block diagram of the applied analytical approach shown in Fig. 1. This quantitative approach used genes differentially expressed in WF exposure, and identifies those that are also common to the differentially expressed genes observed in each cancer study. Further, these shared or common differentially expressed genes were used to construct gene-disease (diseasome) association network, identify signaling and ontological pathways, protein-protein interaction (PPI) network and survival function analysis. This approach also used gold benchmark databases OMIM and dbGaP validate genes and pathways identified in our study as showing possible disease associations.
Figure 1

Flow-diagram of the analytical approach used in this study.

Flow-diagram of the analytical approach used in this study.

Datasets employed in this study

To identify the gene expression dysregulation that is common to WFs and the four types of cancers under investigation, we analyzed gene expression microarray datasets from the National Center for Biotechnology Information (NCBI). We examined five different microarray datasets with accession numbers GSE62384, GSE25071, GSE55945, GSE10072 and GSE2685[13-17]. Dataset GSE62384 was produced using human upper airway epithelial cells (RPMI 2650) exposed to spark generated WFs. These data were generated from cells exposed to WFs for 6 hours continuously at low (85 μg/m3) and high (760 μg/m3) concentrations. The CC dataset (GSE25071) consists of microarray data taken from 17 colorectal cancer sufferers who had late-onset CC (mean age 79 years) and 24 patients with early-onset CC (mean age 43 years). The PC dataset (GSE55945) is a microarray data on RNA taken from radical prostatectomy tissue from prostate cancer patients at the Beth Israel Deaconess Medical Center which compared tissue from PC sufferers (Gleason score 6 or 7) with normal prostate tissue. The LC dataset (GSE10072) contained microarray data comparing normal lung tissue and lung adenocarcinoma tissue collected from 26 former smokers, 20 non-smokers (who never smoked) and 28 current smokers; gene expression data are reported by comparing 49 non-tumor and 58 tumor lung tissues. The GC dataset (GSE2685) contains microarray data from 22 gastric cancer and 8 non-cancerous gastric tissues. To analyze the patient survival association of the altered genes that are common to WFs and the four types of cancers under investigation, we retrieved clinical and RNAseq data for CC, PC, LC and GC from the cBioPortal [18,19]. In the clinical dataset of CC (Colorectal Adenocarcinoma, TCGA, Nature 2012) there are 585 samples with 24 features. The samples of CC have RNAseq gene expression data included 224 cases with 224 mutated genes[20]. The clinical dataset of PC (Prostate Adenocarcinoma, TCGA, Cell 2015) includes 333 samples with 86 features. The RNAseq gene expression data of PC has 333 cases with 333 mutated genes[21]. The LC clinical dataset (Lung Adenocarcinoma, TCGA, PanCancer Atlas) consists of 566 samples with 81 features. The samples of LC have RNAseq gene expression data included 510 cases with 566 genes[22]. The clinical dataset of GC (Stomach Adenocarcinoma, TCGA, Nature 2014) contains 295 samples with 52 features. The samples of GC have RNAseq gene expression data included 265 cases with 295 mutated genes[23]. We employed six clinical factors (ethnicity, anatomical site of cancer, histological grade of cancer, primary tumour site, and neoplasm status with tumour) to analyze the survival of the altered genes that are common to WFs and the four types of cancers under investigation. The summarized description of the datasets is shown in Tables 1 and 2.
Table 1

Summarized description of the datasets used for gene expression and enrichment analysis.

Sl.Disease nameGEO accessionNumber of samples
CaseHealthy
1Welding fumes (WFs)GSE623841806
2Colorectal Cancer (CC)GSE250714604
3Prostate Cancer (PC)GSE559451308
4Lung cancer (LC)GSE100725849
5Gastric Cancer (GC)GSE26852208
Table 2

Summarized description of the datasets used for survival prediction.

Sl.Disease nameDatasets name in the cBioPortalNumber of samples
PatientsClinical featuresRNA-SeqMutated genes
2Colorectal Cancer (CC)Colorectal Adenocarcinoma (TCGA, Nature 2012)58524224224
3Prostate Cancer (PC)Prostate Adenocarcinoma (TCGA, Cell 2015)33386333333
4Lung cancer (LC)Lung Adenocarcinoma (TCGA, PanCancer Atlas)56681510566
5Gastric Cancer (GC)Stomach Adenocarcinoma (TCGA, Nature 2014)29552265295
Summarized description of the datasets used for gene expression and enrichment analysis. Summarized description of the datasets used for survival prediction.

Analysis methods

Microarray-based gene expression analysis is a global and sensitive method to identify and quantify possible molecular mechanisms that underlie human disorders[24]. We used these approaches to analyze the gene expression profiles of CC, PC, LC and GC to find the genetic effects of WFs that may influence the development of these cancers. To allow comparisons of the mRNA expression data generated using different platforms and to avoid complications arising from the different experimental systems employed in the original studies, we normalized the gene expression data by means of Z-score transformation (Z) for each type of cancer tissue gene expression profile using , where SD denotes the standard deviation, g denotes the value of the gene expression i in sample j. After this transformation gene expression values of different diseases at different platforms can be directly compared. We applied unpaired t-tests to find differentially expressed genes of each disease over control data and selected significantly dysregulated genes. We have chosen a threshold of at least 1 log2 fold change and a p-value for the t-tests of . We employed the neighborhood-based benchmark and the multilayer topological methods to find gene-disease associations. We constructed a gene-disease network (GDN) using the gene-disease associations, where the nods in the network represent either gene or disease. This network can also be recognized as a bipartite graph. The primary condition for a disease to be connected with other diseases in GDN is they should share at least one or more significant dysregulated genes. Let is a specific set of diseases and G is a set of dysregulated genes, gene-disease associations attempt to find whether gene is associated with disease . If and , the sets of significantly dysregulated genes associated with diseases D and D respectively, then the number of shared dysregulated genes associated with both disorders D and D is as follows[25]: The common neighbours are the based on the Jaccard Coefficient method, where the edge prediction score for the node pair is as[26]:where G is the set of nodes and E is the set of all edges. We used R software packages “comoR”[27] and “POGO”[28] to cross check their genes-disease associations. To investigate how molecular determinants from the WF exposed tissues relate gene expression alterations in the cancers, we analyzed pathway and gene ontology using Enrichr [29,30]. We used KEGG, WikiPathways, Reactome and BioCarta databases for analyzing signaling pathway[31-34]. We used GO Biological Process and Human Phenotype Ontology databases for ontological analysis[35,36]. We also constructed a protein-protein interaction sub-network for each CD, using the STRING database, a biological database and web resource of known and predicted protein-protein interactions[37]. Furthermore, we examined the validity of our study by employing two gold benchmark databases OMIM and dbGaP. To determine the patient survival association of the altered genes that are common to WFs and the four types of cancers under investigation, we employed Cox PH model for univariate and multivariate analysis[38,39]. The Cox PH model can be written as follows:Here is the hazard function conditioned on a subject with covariate information given as the vector , is the baseline hazard function which is independent of covariate information, and β represents a vector of regression coefficients to the covariates correspondingly. We have calculated the hazard ratio (HR) based on the estimated regression coefficients from the fitted Cox PH model to determine whether a specific covariate affects patient survival. The HR for a covariate can be expressed by the following simple formula exp . Thus, the HR for any covariate can be calculated by applying an exponential function to the corresponding coefficient. The survival status of a patient can be estimated by calculating PL estimator[40] of the survival function can be defined as follows:Here is estimated survival function at time t, d is the number of events occurred at t, and n is the number of subjects available at t. After estimating survival function, two or more groups can be compared using a log-rank test. We used Log-rank tests to detect the most significant genes in the case of patient’s survival time in altered versus normal (non-altered) groups in context of gene expression. The null hypothesis for this test can be symbolically explained as follows:Here H0 is survival functions that are the same for altered and normal gene and H is survival functions that are not the same for these two groups. If the survival function of a specific gene is different among altered and normal groups then we include it to the combined Cox PH model. This approach is efficient for learning the effect of a specific gene of interest on patient survival in the presence of the clinical factors.

Results

Gene expression analysis

To identify and investigate the gene expression effects of WFs that may influence the behaviour of various types of cancer, we analyzed the gene expression microarray data collected from the National Center for Biotechnology Information (NCBI). We observed that WFs have 903 differentially expressed genes obtained by adjusted and . The differentially expressed genes of WFs contain 392 up and 511 down-regulated genes relative to controls. Similarly, the statistical analysis identified the most significant genes with altered expression in each cancer type. The number of differentially expressed genes we identified was 939 (503 up and 436 down) in CC, 553 (323 up and 230 down) in PC, 890 (673 up and 217 down) in LC and 691 (463 up and 228 down) in GC. We also employed a cross-comparative analysis to find the common genes with altered expression between WFs and each CD. We found that WF treated cells share a number of differentially expressed genes with for CC (36 dysregulated genes), PC (13 genes), LC (25 genes) and GC (17 genes). To identify the significant associations among these cancer types with the effects of WF exposure, we constructed two separate gene-disease association-ship networks for up and down-regulated genes using Cytoscape plugins[41], centered on the WF data as shown in Fig. 2(a,b). The necessary condition for two diseases to be associated is they must have at least one or more common differentially expressed genes in between them. Notably, two particular significant genes, C2orf88 and IGFBP5 were differentially expressed among WF exposure, CC and PC; and three significant genes, FCGBP, IQGAP2 and HPGD are common among WF exposure, CC and GC. One gene, FGFR3, is commonly dysregulated among WF exposure, CC and LC.
Figure 2

(a) Up-regulated gene-disease association network of welding fumes (WFs) exposure with colorectal cancer (CC), prostate cancer (PC), lung cancer (LC) and gastric cancer (GC). Octagon-shaped red-colored nodes represent different cancer types and sky-blue colored round-shaped nodes represent commonly up-regulated genes for WFs with the cancers examined. (b) Down-regulated gene-disease association network of welding fumes (WFs) exposure with colorectal cancer (CC), prostate cancer (PC), lung cancer (LC) and gastric cancer (GC). Octagon-shaped red colored nodes represent different cancer types and dark-cyan colored round-shaped nodes represent commonly down-regulated genes for WFs exposure with the different types of cancer examined. (c) Diseasome network showing validation of our study. Red colored octagon-shaped nodes represent different cancer types, pink-colored octagon-shaped nodes represent our selected four CDs and round-shaped sky-blue colored nodes represent differentially expressed genes for WFs exposure. A link is placed between a disease and a gene if mutations in that gene lead to the specific disease.

(a) Up-regulated gene-disease association network of welding fumes (WFs) exposure with colorectal cancer (CC), prostate cancer (PC), lung cancer (LC) and gastric cancer (GC). Octagon-shaped red-colored nodes represent different cancer types and sky-blue colored round-shaped nodes represent commonly up-regulated genes for WFs with the cancers examined. (b) Down-regulated gene-disease association network of welding fumes (WFs) exposure with colorectal cancer (CC), prostate cancer (PC), lung cancer (LC) and gastric cancer (GC). Octagon-shaped red colored nodes represent different cancer types and dark-cyan colored round-shaped nodes represent commonly down-regulated genes for WFs exposure with the different types of cancer examined. (c) Diseasome network showing validation of our study. Red colored octagon-shaped nodes represent different cancer types, pink-colored octagon-shaped nodes represent our selected four CDs and round-shaped sky-blue colored nodes represent differentially expressed genes for WFs exposure. A link is placed between a disease and a gene if mutations in that gene lead to the specific disease.

Pathway and functional association analysis

Pathways are constituted by a series of interactions at the molecular level in a cell, and are a vital key to understand the internal changes of an organism. Pathway-based analysis can be used to identify molecular or biological mechanisms that underlie the development of complex diseases[42,43]. We analyzed pathways of the commonly altered expression genes seen in WF exposure and in the cancers using Enrichr, a comprehensive web-based gene set enrichment analyzing tool[29,44]. Signaling pathways of the commonly altered expression genes of WF exposure and each type of cancer examined were analyzed using four globally recognized databases includes KEGG, WikiPathways, Reactome and BioCarta. We considered signaling pathways from the selected four databases and identified the most significant signaling pathways of each CD after applying several statistical analysis. Notably, we found 6, 7, 5 and 7 signaling pathways are associated with CC, PC, LC and GC, respectively, as shown in Fig. 3.
Figure 3

Pathway analysis for identifying the most significant signaling pathways common to the WF exposed cells and the cancer types revealed by the common differentially expressed genes. These include significant signaling pathways common to WFs exposed cells and (a) CC (b) PC (c) LC and (d) GC.

Pathway analysis for identifying the most significant signaling pathways common to the WF exposed cells and the cancer types revealed by the common differentially expressed genes. These include significant signaling pathways common to WFs exposed cells and (a) CC (b) PC (c) LC and (d) GC.

Gene ontological analysis

The Gene Ontology (GO) refers to a universal conceptual model for representing gene functions and their relationship in the domain of gene regulation. It is constantly expanded by accumulating the biological knowledge to cover the regulation of gene functions and the relationship of these functions in terms of ontology classes and semantic relations between classes[45]. We analyzed ontological pathways of the commonly altered expression genes seen in WFs exposed cells and each cancer type using two recognized databases including GO Biological Process and Human Phenotype Ontology. We considered ontological pathways from selected two databases and identified the most significant ontological pathways for each cancer type after applying several statistical analysis. We found 10, 11, 14 and 14 ontological pathways are associated with the CC, PC, LC and GC, respectively, as shown in Tables 3–6.
Table 3

The most significant ontological pathways common to the WFs exposed cells and CC.

GO TermPathwayGenes in the pathwayAdjusted p-value
GO:0009083Branched-chain amino acid catabolic processBCKDHB, ACAT17.11E-04
GO:0009081Branched-chain amino acid metabolic processBCKDHB, ACAT17.78E-04
GO:0009063Cellular amino acid catabolic processBCKDHB, ACAT18.48E-04
GO:0051924Regulation of calcium ion transportHOMER1, STC14.10E-03
GO:0032092Positive regulation of protein bindingBAMBI, TRIB35.39E-03
GO:0051726Regulation of cell cycleSON, HPGD, CDK106.82E-03
GO:0009966Regulation of signal transductionIGFBP5, HOMER1,FGFR38.49E-03
GO:0051099Positive regulation of bindingBAMBI, TRIB39.56E-03
HP:0001643Patent ductus arteriosusHPGD, NPHP3, FGFR31.46E-03
HP:0001946KetosisBCKDHB, ACAT11.16E-03
Table 6

The most significant ontological pathways common to the WFs exposed cells and GC.

GO TermPathwayGenes in the pathwayAdjusted p-value
GO:0072331Signal transduction by p53 class mediatorPMAIP1, FHIT5.20E-04
GO:2001235Positive regulation of apoptotic signaling pathwayPMAIP1, TPD52L11.92E-03
GO:0097193Intrinsic apoptotic signaling pathwayPMAIP1, FHIT3.33E-03
GO:0042981Regulation of apoptotic processUFM1, EGR3, PMAIP1, THBS14.28E-03
GO:0010634Positive regulation of epithelial cell migrationENPP2, THBS11.82E-03
GO:0045766Positive regulation of angiogenesisITGA5, THBS13.46E-03
GO:0001936Regulation of endothelial cell proliferationEGR3, THBS12.07E-03
GO:0010038Response to metal ionMTF1, THBS11.72E-03
GO:0043066Negative regulation of apoptotic processUFM1, EGR3, ITGA5, THBS16.37E-04
GO:0051094Positive regulation of developmental processENPP2, RBPJ4.14E-03
GO:0034976Response to endoplasmic reticulum stressUFM1, THBS13.66E-03
GO:0043069Negative regulation of programmed cell deathUFM1, EGR3, THBS14.66E-03
HP:0003577Congenital onsetEGR2, HPGD6.35E-03
HP:0000890Long claviclesHPGD7.63E-03
The most significant ontological pathways common to the WFs exposed cells and CC. The most significant ontological pathways common to the WFs exposed cells and PC. The most significant ontological pathways common to the WFs exposed cells and LC. The most significant ontological pathways common to the WFs exposed cells and GC.

Protein-protein interaction analysis

A protein-protein interaction network refers to the binding of proteins in the cell formed by biochemical or complex biological functions. Protein-protein interactions are essential to understand the cell physiology in health and disease states. We constructed and analyzed protein-protein interaction networks of the significantly altered expression genes of each CD using the STRING database. We clustered protein-protein interactions of cancer types into four different groups as shown in Fig. 4.
Figure 4

Protein-protein interaction network of the four types of cancer using STRING.

Protein-protein interaction network of the four types of cancer using STRING.

Survival analysis

Patient survival analysis using both gene expression and clinical data is a popularly used feature in research to predict and characterize gene signatures in cancer[46]. In this study, we estimated survival function for altered and normal groups of the significant genes that are common to WFs and the four types of cancers under investigation by employing Cox PH model and PL estimator analysis. We fitted both univariate and multivariate analysis of the Cox PH regression model. The significant genes of the four selected cancers with estimated coefficients (β), hazard ratios (HR) and p-values from those analyses are shown in Tables 7–10. After these analyses we selected the most significant genes for the four types of cancers by choosing a threshold () of the p-value. The survival curves of the most significant genes, comparing altered and normal groups had been obtained by using the PL estimator as shown in Fig. 5. Note that, from Fig. 5, we can see that those with altered expression of genes show lower survival compared to the normal group.
Table 7

β coefficient, hazard ratio and p-values in univariate, multivariate and combined models of the identified genes that are common between WFs and CC.

Gene symbolUnivariateMultivariateCombined
βHRp-valueβHRp-valueβHRp-value
NPHP3−1.74E-018.40E-013.27E-01−2.52E-017.77E-012.34E-01−5.21E-029.49E-018.38E-01
SAMD138.58E-021.09E+006.66E-011.24E-021.01E+009.58E-011.97E-011.22E+004.55E-01
STC19.02E-021.09E+006.19E-011.08E-011.11E+006.06E-01−1.08E-018.98E-016.56E-01
THEM69.20E-021.10E+005.89E-018.11E-021.08E+006.97E-012.95E-011.34E+002.10E-01
TNFRSF12A3.16E-011.37E+008.94E-026.14E-011.85E+005.27E-038.19E-012.27E+006.48E-01
ASNS1.35E-011.14E+005.23E-011.66E-011.18E+004.86E-012.38E-011.27E+004.17E-01
FAM89A2.19E-011.24E+002.12E-013.28E-011.39E+001.12E-011.80E-011.20E+004.13E-01
ACAT1−3.24E-029.68E-018.49E-01−9.12E-029.13E-016.55E-011.41E-011.15E+005.78E-01
HPGD3.90E-011.48E+006.10E-023.63E-011.44E+001.25E-012.60E-011.30E+003.36E-01
SLC7A83.99E016.71E-015.01E-022.38E-017.88E-013.46E-015.80E-015.60E-014.87E-02
TRIB31.30E-021.01E+009.39E-01−9.81E-029.07E-016.12E-01−3.09E-017.34E-011.58E-01
APPL2−1.63E-018.50E-014.50E-01−1.29E-018.79E-015.98E-015.43E-021.06E+008.45E-01
LRRC75B1.42E-018.67E-014.52E-013.28E-017.20E-011.35E-015.23E-015.93E-013.58E-02
BAMBI4.46E-011.56E+001.39E-025.17E-011.68E+001.53E-022.39E-011.27E+003.32E-01
FGFR32.06E-018.14E-013.04E-016.86E-015.03E-014.39E-038.88E-014.11E-019.54E-03
DMXL25.36E-021.06E+007.46E-013.93E-021.04E+008.45E-011.93E-011.21E+003.98E-01
CCL202.16E-011.24E+002.62E-014.86E-021.05E+008.24E-011.03E-021.01E+009.67E-01
SERPINE2−3.53E-017.03E-016.55E-02−4.57E-016.33E-014.42E-02−5.14E-015.98E-014.31E-01
CDK103.49E-021.04E+008.50E-011.02E-011.11E+006.21E-014.02E-011.50E+009.56E-02
SON4.35E-011.54E+001.48E-022.89E-011.33E+001.44E-013.43E-011.41E+001.78E-01
FCGBP1.18E-011.12E+005.39E-012.44E-011.28E+002.91E-012.33E-011.26E+003.44E-01
IQGAP2−6.58E-029.36E-017.27E-011.66E-011.18E+004.71E-012.61E-011.30E+002.94E-01
LUC7L3−9.57E-039.90E-019.57E-01−2.53E-017.77E-012.32E-01−2.38E-029.77E-019.22E-01
BCKDHB2.77E-011.32E+001.01E-014.12E-011.51E+003.43E-025.91E-011.81E+006.52E-03
LYAR2.14E-011.24E+002.71E-011.13E-011.12E+006.14E-01−1.11E-018.95E-016.57E-01
NAAA2.33E-017.92E-012.44E-012.19E-018.03E-014.02E-015.92E-015.53E-012.86E-02
SKAP2−1.80E-018.35E-013.12E-01−2.83E-017.54E-011.70E-01−1.09E-018.97E-016.46E-01
BHLHE404.44E-031.00E+009.81E-01−1.13E-018.94E-016.30E-01−1.50E-018.61E-015.75E-01
IGFBP5−4.24E-029.59E-018.31E-019.18E-021.10E+006.98E-012.18E-011.24E+003.99E-01
SNX142.05E-021.02E+009.08E-016.87E-021.07E+007.37E-01−1.35E-018.74E-015.59E-01
HOMER1−1.84E-018.32E-013.60E-01−2.37E-017.89E-012.83E-01−3.02E-017.39E-012.03E-01
PAPLN−3.29E-017.20E-011.23E-01−4.48E-016.39E-018.79E-02−2.94E-017.45E-013.21E-01
Table 10

β coefficient, hazard ratio and p-values in univariate, multivariate and combined models of the identified genes that are common between WFs and GC.

Gene symbolUnivariateMultivariateCombined
βHRp-valueβHRp-valueβHRp-value
ABCA41.44E-011.16E+006.89E-012.86E-011.33E+004.53E-013.51E-011.42E+003.60E-01
CNN2−9.03E-029.14E-016.45E-01−2.10E-018.10E-013.12E-01−3.64E-016.95E-019.73E-02
EGR2−2.19E-018.03E-016.64E-01−9.34E-029.11E-018.58E-01−7.03E-029.32E-018.96E-01
EGR3−2.51E-017.78E-014.41E-01−1.95E-018.23E-015.64E-01−5.92E-029.43E-018.65E-01
ENPP2−6.84E-029.34E-018.04E-01−6.85E-029.34E-018.10E-016.74E-021.07E+008.20E-01
FCGBP−5.25E-029.49E-018.66E-01−2.88E-029.72E-019.29E-01−6.37E-039.94E-019.84E-01
FHIT−2.65E-029.74E-019.38E-013.03E-011.35E+003.98E-013.37E-011.40E+003.51E-01
HPGD4.71E-011.60E+008.91E-026.97E-012.01E+002.22E-026.83E-011.98E+002.90E-02
IQGAP2−4.14E-016.61E-011.82E-01−7.08E-014.93E-013.29E-02−6.01E-015.48E-017.69E-02
ITGA5−8.11E-014.45E-015.39E-02−9.77E-013.77E-012.95E-02−1.16E+003.15E-011.35E-02
MTF1−9.49E-029.10E-016.40E-01−1.21E-018.86E-015.76E-01−6.43E-029.38E-017.71E-01
PMAIP1−5.46E-029.47E-018.44E-01−1.10E-018.96E-017.18E-01−6.08E-029.41E-018.45E-01
RBPJ5.92E-011.81E+006.64E-037.14E-012.04E+002.41E-037.31E-012.08E+002.37E-03
THBS1−2.71E-017.63E-014.52E-01−2.75E-017.60E-014.62E-01−2.04E-018.15E-015.85E-01
TPD52L12.90E-031.00E+009.92E-01−1.43E-018.67E-016.54E-01−1.79E-018.36E-015.80E-01
TROVE2−1.26E-029.88E-019.54E-01−1.33E-018.75E-015.78E-01−1.54E-018.57E-015.25E-01
UFM1−1.47E-018.63E-013.50E-01−1.61E-018.51E-013.52E-01−2.06E-018.14E-012.46E-01
Figure 5

Survival function for an altered and normal group of the most significant genes that are common to WFs and the four types of cancers under investigation. These include significant genes common to WFs exposed cells and CC (a–e), PC (f,g), LC (h,i) and GC (j–l). Here, the cyan colored line in the survival graphs indicates the altered and the red indicates the normal gene expression group.

β coefficient, hazard ratio and p-values in univariate, multivariate and combined models of the identified genes that are common between WFs and CC. β coefficient, hazard ratio and p-values in univariate, multivariate and combined models of the identified genes that are common between WFs and PC. β coefficient, hazard ratio and p-values in univariate, multivariate and combined models of the identified genes that are common between WFs and LC. β coefficient, hazard ratio and p-values in univariate, multivariate and combined models of the identified genes that are common between WFs and GC. Survival function for an altered and normal group of the most significant genes that are common to WFs and the four types of cancers under investigation. These include significant genes common to WFs exposed cells and CC (a–e), PC (f,g), LC (h,i) and GC (j–l). Here, the cyan colored line in the survival graphs indicates the altered and the red indicates the normal gene expression group.

Discussion

In this study we investigated how WF exposure may influence a number of types of cancer whose development and growth is greater with exposure to WFs or the components of WFs. We compared the gene expression alterations that result from WF exposure in cells with that of the genes that have dysregulated expression in several cancer types. The idea behind this is similar to studies of comorbidities, where dysregulated genes (or more usually gene pathways) that are common to two diseases give clues to how those diseases interact when co-occurring in the same individual, even if we are unclear as to the reason for the altered expression of individual genes or pathways is unclear. Thus, genes or gene pathways altered in response to WF exposure and the cancers of interest can be means by which WF exposure encourages those cancers to develop. Note that WFs included components such as metal fumes that are absorbed by the lungs into the bloodstream, to expose many tissues around the body. Many of these fumes are carcinogenic, but cancer initiation is only one of a number of stages of cancer development and progression, and welders commonly have regular exposure to fumes over long periods. Unlike in other morbidities, some altered gene expression may arise in individual cancer cells due to mutations which will affect survival of those cells; if such altered expression the is detected in whole cancer tissue across many individuals (as in our studies) then the alteration may be affecting pathways that encourage survival and growth. Thus we have applied a systematic approach to identify pathways that WFs may affect the cancer behaviours. For our analysis we employed gene regulation analysis, gene-disease association networks, signaling and ontological pathways, and protein-protein interaction networks. To identify pathways and genes that are important in WF interactions in the cellular processes that influence cancer progression, we examined gene expression microarray data from WF exposed cells, CC, PC, LC and GC, each with control datasets. This identified a large number of significant genes that were commonly dysregulated between WF-exposure and cancer profiles, and evident by simple gene expression comparisons. There were a number of dysregulated genes that were common between WF exposure responses and cancer types, which suggests that WF exposure may cause gene expression changes that could affect the behaviour of cancers. It should be noted that the cancer transcriptome datasets, such as those employed here, contain transcripts from both cancer cells and the supporting stromal cells found in the tumors themselves. Thus, it should be noted that WFs may exert their effects on cancers either indirectly (through tumor stroma) or on the cancer cells themselves. We constructed two separate gene-disease association networks for up- and down-regulated genes showed strong evidence that WFs may indeed influence these cancers as indicated in Fig. 2(a,b). The pathway-based analysis is a technique to better understand the molecular or biological mechanisms underlying different complex diseases by determining common pathways that a stimulus (such as WFs) may influence cells of interest. We identified significant signaling and ontological pathways of the commonly dysregulated genes of each cancer. These identified pathways indicated how WFs may affect these cancer types. Similarly, protein-protein interaction sub-networks of the commonly altered genes suggest that WFs affect several types of cancers. Note that if a pathway is a conduit for the effects of an important risk factor for a disease, this points to that pathway being particularly important to the pathogenesis of the disease and that reducing that pathways effects could be a way to attack the disease progression itself. It should be noted that these findings only point to possible ways that WF exposure may affect the cancers and cannot prove causation. However, when we investigated whether the gene expression patterns that we have observed could be associated with reduced survival of the patients (pointing to the importance of those gene expression levels either directly or indirectly) that is what we observed for several of significant genes that are common WF the cancer profiles under investigation as shown in Fig. 5. It should be noted that the datasets employ a number of different cell types, which is commonly the case in this type of study. While gene expression patterns are, by definition, different in different cell types, here we were only concerned with expression alterations; certain responses to WFs may not occur in all cells so, while our approach cannot identify all pathways affected by WFs in nascent tumour cells, it will find some. Indeed, our data provides evidence to suggest the involvement of a number of genes in cancer behaviours that are linked to the noxious effect of WFs on cancer. We used the gold benchmark databases OMIM and dbGaP for cross checking the validity of our outcome and found that there were some shared genes in between the WF exposure and cancer types as shown in Fig. 2(c). For validation purposes, we collected disease with associated genes from the dbGaP, OMIM Disease and OMIM Expanded databases using differentially expressed genes of WFs. After several steps of statistical analysis we selected only cancer related diseases. Interestingly, we found our selected four cancers among the list of cancers collected from the mentioned databases as shown in Fig. 2(c). Moreover, we found our identified genes in Fig. 2(c) had been shown in other studies to be associated with disease progression in cancers. Specifically, Vázquez-ArreguÃn K. et al., Cybulski C. et al. and Wang L. et al. shown RAB4B, CHEK2 and FOS to be associated with CC incidence[47-50]; Biswas S. et al. found a link between TGFBR2 and CC[51]. Lijovic M. et al. showed CD82 to be linked to PC incidence[52]; Wang Y. et al. shown the association between CHEK2 and PC progression[53]; Ouyang X. et al. identified a link between FOS and PC[54]; Gruosso T. et al. showed MAP3K8 to be associated with LC[55]; Vallejo A. et al. found a link between FOS and LC incidence[56]; Yuan S. et al. showed an association between GPC5 and LC progression[57]. Kim CJ. et al. found MUTYH to be associated to GC incidence[58]; Myllykangas S. found an association between FOS and GC[59]; Teodorczyk U. et al. found CHEK2 to be linked to GC progression[60]. Therefore, it suggested that WFs may have a strong interaction with CC, PC, LC and GC.

Conclusions

In this study, we considered gene expression microarray data from WFs exposure, CC, PC, LC, GC and control datasets to analyze and investigate the genetic links between WF exposure and the effects that they have on cancers. We analyzed gene expression, constructed gene-disease association networks, identified signaling and ontological pathways, analyzed protein-protein interaction networks and survival function of WFs exposed cells and cancers. The outcome of our study indicated that WFs can exert a strong influence on cancers. This kind of study will be useful for making more accurate disease prediction, and identifyi potentially better therapeutic approaches. This study will also be useful for assessing the dangerous effects of welding on the human body.
Table 4

The most significant ontological pathways common to the WFs exposed cells and PC.

GO TermPathwayGenes in the pathwayAdjusted p-value
GO:0071420Cellular response to histamineGABRB34.79E-03
GO:0044321Response to leptinLEPR5.39E-03
GO:0035024Negative regulation of Rho protein signal transductionARAP38.97E-03
GO:2000369Regulation of clathrin-dependent endocytosisARAP39.56E-03
GO:0071417Cellular response to organonitrogen compoundGABRB3, IGFBP51.52E-03
GO:2000146Negative regulation of cell motilityIGFBP5, ARAP31.52E-03
GO:0030336Negative regulation of cell migrationIGFBP5, ARAP32.34E-03
GO:0071407Cellular response to organic cyclic compoundGABRB3, IGFBP52.90E-03
GO:0014912Negative regulation of smooth muscle cell migrationIGFBP58.97E-03
HP:0000823Delayed pubertyLEPR1.67E-02
HP:0000824Growth hormone deficiencyLEPR1.67E-02
Table 5

The most significant ontological pathways common to the WFs exposed cells and LC.

GO TermPathwayGenes in the pathwayAdjusted p-value
GO:1903708Positive regulation of hemopoiesisN4BP2L2, HOXA54.18E-05
GO:0009132Nucleoside diphosphate metabolic processAK49.96E-03
GO:0045647Negative regulation of erythrocyte differentiationHOXA58.72E-03
GO:2000665Regulation of interleukin-13 secretionPRKCZ8.72E-03
GO:0044320Cellular response to leptin stimulusLEPR9.96E-03
GO:0045837Negative regulation of membrane potentialPMAIP18.72E-03
GO:2000394Positive regulation of lamellipodium morphogenesisENPP29.96E-03
GO:0050730Regulation of peptidyl-tyrosine phosphorylationENPP2, PRKCZ5.14E-03
GO:0032754Positive regulation of interleukin-5 productionPRKCZ9.96E-03
HP:0000975HyperhidrosisSLCO2A1, FGFR35.38E-03
HP:0000522AlacrimaFGFR31.24E-02
HP:0010662Abnormality of the diencephalonLEPR1.86E-02
HP:0000495Recurrent corneal erosionsFGFR31.49E-02
HP:0001413Micronodular cirrhosisKRT81.86E-02
Table 8

β coefficient, hazard ratio and p-values in univariate, multivariate and combined models of the identified genes that are common between WFs and PC.

Gene symbolUnivariateMultivariateCombined
βHRp-valueβHRp-valueβHRp-value
ARAP31.12E-011.12E+004.48E-011.81E-011.20E+002.57E-012.38E-011.27E+001.39E-01
CA141.49E-011.16E+003.74E-011.64E-011.18E+003.59E-012.11E-011.24E+002.45E-01
F2RL2−4.70E-029.54E-017.53E-01−8.16E-029.22E-016.00E-01−5.31E-029.48E-017.41E-01
FNIP23.27E-031.00E+009.84E-01−6.86E-039.93E-019.68E-011.20E-011.13E+004.93E-01
GABRB3−9.83E-029.06E-014.13E-01−1.21E-018.86E-013.28E-01−1.14E-018.93E-013.67E-01
IGFBP53.63E-011.44E+002.28E-023.85E-011.47E+001.86E-024.17E-011.52E+001.23E-02
LEPR−1.95E-029.81E-018.78E-01−2.36E-029.77E-018.57E-01−9.40E-029.10E-014.84E-01
NDRG24.03E-011.50E+001.61E-023.80E-011.46E+002.74E-024.16E-011.52E+001.85E-02
SPON1−8.50E-029.19E-016.38E-01−2.19E-018.04E-012.66E-01−3.41E-017.11E-018.93E-02
TOX3−2.11E-029.79E-018.95E-018.41E-031.01E+009.60E-01−5.66E-029.45E-017.40E-01
Table 9

β coefficient, hazard ratio and p-values in univariate, multivariate and combined models of the identified genes that are common between WFs and LC.

Gene symbolUnivariateMultivariateCombined
βHRp-valueβHRp-valueβHRp-value
PROS1−5.57E-029.46E-018.12E-012.56E-021.03E+009.28E-012.42E-011.27E+004.09E-01
SPON11.76E-011.19E+005.24E-014.52E-011.57E+001.72E-014.30E-011.54E+002.48E-01
KCNK1−2.05E-018.15E-015.02E-01−1.35E-018.73E-017.04E-01−2.63E-017.68E-014.76E-01
KRT81.19E-011.13E+006.81E-012.11E-011.23E+005.35E-012.88E-011.33E+004.21E-01
ID37.21E-021.07E+008.34E-011.62E-011.18E+007.43E-011.13E-011.12E+008.39E-01
CRYAB−1.58E-018.54E-015.87E-01−8.98E-039.91E-019.79E-01−1.44E-018.66E-016.85E-01
FGFR34.95E-011.64E+001.49E-015.82E-011.79E+001.71E-014.73E-011.60E+002.80E-01
ENPP2−3.64E-016.95E-011.99E-01−5.45E-015.80E-011.26E-01−5.27E-015.90E-011.44E-01
PMAIP12.78E-011.32E+004.37E-016.49E-011.91E+001.31E-011.02E+002.78E+002.91E-02
RGS17−1.95E-018.23E-014.77E-01−2.73E-017.61E-014.00E-01−4.14E-029.59E-018.99E-01
MTHFD22.14E-011.24E+004.29E-014.27E-011.53E+002.89E-015.09E-011.66E+002.13E-01
ARAP33.37E-011.40E+002.32E-013.25E-011.38E+003.89E-015.24E-011.69E+001.83E-01
TGFA3.10E-011.36E+003.66E-013.53E-011.42E+004.32E-014.43E-011.56E+003.62E-01
HOXA5−6.82E-015.06E-011.41E-01−1.59E+002.03E-019.55E-03−1.86E+001.56E-015.18E-03
PNISR−1.38E-018.71E-016.23E-01−6.86E-017.56E-014.19E-01−4.15E-016.60E-012.44E-01
GADD45A2.66E-011.31E+003.07E-01−2.80E-011.07E+008.28E-012.42E-011.27E+004.78E-01
FHL2−3.39E-029.67E-019.02E-017.20E-026.50E-012.89E-01−2.87E-017.51E-014.61E-01
TOX3−1.10E-018.96E-016.91E-01−4.30E-015.89E-011.65E-01−7.57E-014.69E-016.12E-02
LEPR1.51E-011.16E+005.71E-01−5.29E-011.12E+007.35E-012.02E-011.22E+005.45E-01
SLCO2A1−2.08E-018.13E-015.45E-011.12E-018.36E-016.82E-01−1.20E-018.87E-017.90E-01
SLC7A51.42E-011.15E+005.88E-01−1.79E-011.23E+005.05E-01−2.26E-029.78E-019.44E-01
MPDZ2.37E-011.27E+003.00E-012.11E-011.34E+002.58E-014.03E-011.50E+001.42E-01
  25 in total

1.  Global gene expression analysis of gastric cancer by oligonucleotide microarrays.

Authors:  Yoshitaka Hippo; Hirokazu Taniguchi; Shuichi Tsutsumi; Naoko Machida; Ja-Mun Chong; Masashi Fukayama; Tatsuhiko Kodama; Hiroyuki Aburatani
Journal:  Cancer Res       Date:  2002-01-01       Impact factor: 12.701

2.  Genetic effects of welding fumes on the progression of neurodegenerative diseases.

Authors:  Humayan Kabir Rana; Mst Rashida Akhtar; Mohammad Boshir Ahmed; Pietro Liò; Julian M W Quinn; Fazlul Huq; Mohammad Ali Moni
Journal:  Neurotoxicology       Date:  2018-12-17       Impact factor: 4.294

3.  Carcinogenicity of welding, molybdenum trioxide, and indium tin oxide.

Authors:  Neela Guha; Dana Loomis; Kathryn Z Guyton; Yann Grosse; Fatiha El Ghissassi; Véronique Bouvard; Lamia Benbrahim-Tallaa; Nadia Vilahur; Karen Muller; Kurt Straif
Journal:  Lancet Oncol       Date:  2017-04-10       Impact factor: 41.316

Review 4.  Occupational exposures and colorectal cancers: a quantitative overview of epidemiological evidence.

Authors:  Enrico Oddone; Carlo Modonesi; Gemma Gatta
Journal:  World J Gastroenterol       Date:  2014-09-21       Impact factor: 5.742

5.  [On the purification and characterization of 2,3-PGase of red blood cells].

Authors:  G Sauer; D Scholz
Journal:  Folia Haematol Int Mag Klin Morphol Blutforsch       Date:  1965

6.  CLC and IFNAR1 are differentially expressed and a global immunity score is distinct between early- and late-onset colorectal cancer.

Authors:  T H Ågesen; M Berg; T Clancy; E Thiis-Evensen; L Cekaite; G E Lind; J M Nesland; A Bakka; T Mala; H J Hauss; T Fetveit; M H Vatn; E Hovig; A Nesbakken; R A Lothe; R I Skotheim
Journal:  Genes Immun       Date:  2011-06-30       Impact factor: 2.676

7.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

Authors:  Maxim V Kuleshov; Matthew R Jones; Andrew D Rouillard; Nicolas F Fernandez; Qiaonan Duan; Zichen Wang; Simon Koplev; Sherry L Jenkins; Kathleen M Jagodnik; Alexander Lachmann; Michael G McDermott; Caroline D Monteiro; Gregory W Gundersen; Avi Ma'ayan
Journal:  Nucleic Acids Res       Date:  2016-05-03       Impact factor: 16.971

Review 8.  Small-cell lung cancer: what we know, what we need to know and the path forward.

Authors:  Adi F Gazdar; Paul A Bunn; John D Minna
Journal:  Nat Rev Cancer       Date:  2017-10-27       Impact factor: 60.716

9.  Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer.

Authors:  Katherine A Hoadley; Christina Yau; Toshinori Hinoue; Denise M Wolf; Alexander J Lazar; Esther Drill; Ronglai Shen; Alison M Taylor; Andrew D Cherniack; Vésteinn Thorsson; Rehan Akbani; Reanne Bowlby; Christopher K Wong; Maciej Wiznerowicz; Francisco Sanchez-Vega; A Gordon Robertson; Barbara G Schneider; Michael S Lawrence; Houtan Noushmehr; Tathiane M Malta; Joshua M Stuart; Christopher C Benz; Peter W Laird
Journal:  Cell       Date:  2018-04-05       Impact factor: 41.582

10.  Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival.

Authors:  Maria Teresa Landi; Tatiana Dracheva; Melissa Rotunno; Jonine D Figueroa; Huaitian Liu; Abhijit Dasgupta; Felecia E Mann; Junya Fukuoka; Megan Hames; Andrew W Bergen; Sharon E Murphy; Ping Yang; Angela C Pesatori; Dario Consonni; Pier Alberto Bertazzi; Sholom Wacholder; Joanna H Shih; Neil E Caporaso; Jin Jen
Journal:  PLoS One       Date:  2008-02-20       Impact factor: 3.240

View more
  4 in total

1.  ESRRG, ATP4A, and ATP4B as Diagnostic Biomarkers for Gastric Cancer: A Bioinformatic Analysis Based on Machine Learning.

Authors:  Qiu Chen; Yu Wang; Yongjun Liu; Bin Xi
Journal:  Front Physiol       Date:  2022-06-23       Impact factor: 4.755

2.  Classify multicategory outcome in patients with lung adenocarcinoma using clinical, transcriptomic and clinico-transcriptomic data: machine learning versus multinomial models.

Authors:  Fei Deng; Lanlan Shen; He Wang; Lanjing Zhang
Journal:  Am J Cancer Res       Date:  2020-12-01       Impact factor: 6.166

3.  Bioinformatics and System Biological Approaches for the Identification of Genetic Risk Factors in the Progression of Cardiovascular Disease.

Authors:  Joy Dip Barua; Shudeb Babu Sen Omit; Humayan Kabir Rana; Nitun Kumar Podder; Utpala Nanda Chowdhury; Md Habibur Rahman
Journal:  Cardiovasc Ther       Date:  2022-08-09       Impact factor: 3.368

4.  Network-Based Data Analysis Reveals Ion Channel-Related Gene Features in COVID-19: A Bioinformatic Approach.

Authors:  Hao Zhang; Ting Feng
Journal:  Biochem Genet       Date:  2022-09-14       Impact factor: 2.220

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.