This study investigates the molecular markers and biological pathways of pulmonary lymphangioleiomyomatosis. We analyzed 2 gene expression profiles in the gene expression omnibus Gene Expression Omnibus database for normal lung tissue and lymphangioleiomyomatosis and identified differential expressed genes in pulmonary lymphangioleiomyomatosis. Ninety-one differentially expressed genes were identified, including 36 upregulated genes and 55 downregulated genes. Hub genes and pathogenic pathways associated with disease development were subsequently identified by enrichment analysis and protein-protein interaction network. Analysis showed that differential expressed genes are mainly involved in the biological behavior of tumor cell proliferation and invasion as well as the inflammatory response. We have identified 10 hub genes in the protein-protein interaction network. Hub genes play an important role in the proliferation and inflammatory response involved in tumor cell proliferation. This study deepens the understanding of lymphangioleiomyomatosis disease and provides a biological basis for further clinical diagnosis and treatment.
This study investigates the molecular markers and biological pathways of pulmonary lymphangioleiomyomatosis. We analyzed 2 gene expression profiles in the gene expression omnibus Gene Expression Omnibus database for normal lung tissue and lymphangioleiomyomatosis and identified differential expressed genes in pulmonary lymphangioleiomyomatosis. Ninety-one differentially expressed genes were identified, including 36 upregulated genes and 55 downregulated genes. Hub genes and pathogenic pathways associated with disease development were subsequently identified by enrichment analysis and protein-protein interaction network. Analysis showed that differential expressed genes are mainly involved in the biological behavior of tumor cell proliferation and invasion as well as the inflammatory response. We have identified 10 hub genes in the protein-protein interaction network. Hub genes play an important role in the proliferation and inflammatory response involved in tumor cell proliferation. This study deepens the understanding of lymphangioleiomyomatosis disease and provides a biological basis for further clinical diagnosis and treatment.
Lymphangioleiomyomatosis (LAM) is a rare disease that occurs in women of childbearing age and can involve the lungs, mediastinum, and retroperitoneal lymphatics.[ It is characterized histologically by abnormal proliferation of smooth muscle-like cells that can spread through the lymphatic system, with the lungs being the most commonly involved organ.[ Pulmonary LAM (PLAM) was once thought to be an interstitial lung disease, but later studies have described PLAM as being intermediate between benign and malignant disease.[ It is a mutation and loss of heterozygosity in the TSC1 gene or TSC2 gene caused by abnormal smooth muscle-like LAM cell proliferation.[ The diagnosis and treatment of PLAM is a worldwide challenge. In this study, we first selected GES12027 and GSE16944 from the Gene Expression Omnibus (GEO) database. We applied R software for data analysis and obtained the common differential expressed genes (DEGs) of GSE12027 and GSE160618. Molecular function (MF), cellular composition (CC), biological processes (BP), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were analyzed using R software. We then built the protein–protein interaction (PPI) network and performed some additional analysis with Cytoscape. Our aim is to initially search for pathogenic mechanisms and pathways of PLAM at the genetic level to inform future diagnosis and treatment.
2. Methods
2.1. Microarray data information
National Center for Biotechnology Information-GEO is a free public microarray/gene mapping database. The search strategy ((“lymphangioleiomyomatosis”[MeSH Terms] OR Lymphangioleiomyomatosis[All Fields]) AND “Expression profiling by array”[Filter]) was obtained.Inclusion criteria are: the sample source was Homo sapiens; and normal tissue and LAM tissue were controls.We obtained the gene expression profiles of GSE12027 and GSE160618 in LAM and normal lung tissue from the GEO database. Microarray data of GSE12027 were on account of GPL96 ([HG-U133A] Affymetrix Human Genome U133A Array), Microarray data of GSE160618 were on account of GPL570 ([HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array). Microarray data of GSE12027 contain 14 female LAM nodules and 7 pulmonary artery smooth muscles (CC-2581, Lonza). Microarray data of GSE160618 contain 6 LAM samples and 6 normal samples. Data table header descriptions and series matrix files of GSE12027 and GSE160618 were downloaded.
2.2. Data processing of DEGs
Gene annotation was performed using clusterProfiler of R software program, and the data set was normalized by quantile logarithm[ (Fig. 1).
Figure 1.
Box plots of gene expression data before and after standardization. (A) unstandardized for GSE12027, (B) standardized for GSE12027, (C) unstandardized for GS E160618, and (D) standardized for GSE160618. x-axis labels represent sample symbols and y-axis labels represent gene expression values. The black line in the box plot represents the median gene expression value.
Box plots of gene expression data before and after standardization. (A) unstandardized for GSE12027, (B) standardized for GSE12027, (C) unstandardized for GS E160618, and (D) standardized for GSE160618. x-axis labels represent sample symbols and y-axis labels represent gene expression values. The black line in the box plot represents the median gene expression value.The limma V3.50.0 (linear models for microarray data) package of the R software program (version 4.1.2) was used to screen the samples for DEGs with a threshold of |log FC| >1, P < .05.[The ggplot2 package of the R software program (version 3.5.0) was used to produce volcano plots of the DEGs. The pheatmap package of the R software program (version 3.5.0) was used to produce heat maps of the DEGs and build a Venn diagram.[
2.3. Gene Ontology and pathway enrichment analysis
Gene Ontology (GO) is an international standard classification system for gene function, which aims to establish a linguistic vocabulary standard for qualifying and describing gene and protein function that is applicable to all species and can be updated as research progresses. GO is divided into 3 parts: MF, BP, and cellular component (CC).[KEGG is a database on genome decipherment.[In this study, GO terms and KEGG pathway enrichment analysis of DEGs were automatically completed and visualized by the clusterProiler digest, and the Goplot package in the R software statistical analysis platform (significant as P < .05 and a q-value < 0.05) and the CluePedia plugin and ClueGO plug-in in Cytoscape software (version 3.7.1, http://www.cytoscape.org/) (with a kappa score ≥0.4).[
2.4. Gene Set Enrichment Analysis
We used GSESA software for further enrichment analysis of all differential genes. Gene sets were considered significantly enriched with alpha or P values <5% and false discovery rate <25%.[
2.5. Construction of PPI network and identification of hub genes
STRING is an online tool to identify and predict interactions between genes or proteins. We use STRING to construct PPI networks of DEGs. Visualization of DEG PPI network using Cytoscape software. We use biNGO plug-in and cytoHubba plug-in for analysis.[
2.6. DO analysis
Similar to GO database, Disease Ontolog (DO) provides a unified and standardized disease classification system by referring to MeSH, ICD, and other disease classification standards for common human diseases and rare diseases. The DOSE package of the R software program (version 3.5.0) was used to produce plots of the hub genes.
3. Result
3.1. Identification of DEGs
To identify DEGs in LAM and normal tissue samples, we screened 2334 and 496 DEGs from GSE12027 and GSE160618, respectively. After removing the null genes as well as mitochondrial genes, there were 1138 upregulated genes and 1191 downregulated genes in the DEGs of GSE12027. We had 207 upregulated genes and 263 downregulated genes in GSE160168 DEGs. We mapped the differential genes to the volcano map and heat map. There were 36 common upregulated genes and 55 common downregulated genes in both samples. We built the Venn graph with Package (VennDiagram) of R software (version 4.1.2; Figs. 2 and 3; Table 1).
Figure 2.
Volcano plots of differential expressed genes. Red data points represent upregulated genes and blue data points represent downregulated genes. Differences are set to | log FC | >1; heat map of differential expressed genes identified in (A) GSE12027, (B) GSE160618.
Figure 3.
Venn diagrams of commonly differential expressed genes in the 2 datasets.
Table 1
Differential expressed genes of GSE12027 and GSE160618.
Differential expressed genes of GSE12027 and GSE160618.DEG = differentially expressed gene.Volcano plots of differential expressed genes. Red data points represent upregulated genes and blue data points represent downregulated genes. Differences are set to | log FC | >1; heat map of differential expressed genes identified in (A) GSE12027, (B) GSE160618.Venn diagrams of commonly differential expressed genes in the 2 datasets.
3.2. GO and KEGG pathway enrichment analyses
We used Package (clusterProiler, Digest, and Goplot) of the R software (version 4.1.2) to perform KEGG and GO analysis on 36 upregulated genes and 55 downregulated genes. GO analysis showed that the significant enrichment in BP was epithelial cell proliferation, negative regulation of cellular component movement, muscle system process, negative regulation of cell motility, regulation of body fluid levels, negative regulation of locomotion, muscle tissue development, response to peptide hormone, negative regulation of cell migration, regulation of reproductive process. In CC, differential genes were enriched in collagen-containing extracellular matrix, cell–cell junction, focal adhesion, cell–substrate junction, platelet alpha granule, perikaryon, platelet alpha granule lumen, rough endoplasmic reticulum, basement membrane, and T-tubule. For MF, genes were significantly enriched in receptor ligand activity, signaling receptor activator activity, glycosaminoglycan binding, sulfur compound binding, G protein-coupled peptide receptor activity, peptide receptor activity, heparin binding, fibronectin binding, peptide hormone binding, and hormone receptor binding (Fig. 4).
Figure 4.
GO enrichment analysis of DEGs. The bubble and bar charts show the GO significance items of DEGs in 3 functional groups: MF, BP, and CC. BP = biological process, CC = cell composition, DEG = differentially expressed gene, GO = Gene Ontology, MF = molecular function.
GO enrichment analysis of DEGs. The bubble and bar charts show the GO significance items of DEGs in 3 functional groups: MF, BP, and CC. BP = biological process, CC = cell composition, DEG = differentially expressed gene, GO = Gene Ontology, MF = molecular function.In the KEGG analysis, the differential genes were mainly enriched in hypertrophic cardiomyopathy, Ras signaling pathway, arrhythmogenic right ventricular cardiomyopathy, melanoma, prolactin signaling pathway, Focal adhesion, choline metabolism in cancer, calcium signaling pathway, type II diabetes mellitus, MAPK signaling pathway, cAMP signaling pathway, pathways in cancer, dilated cardiomyopathy, cGMP-PKG signaling pathway, and cytokine–cytokine receptor interaction (Fig. 5).
Figure 5.
KEGG pathway analysis of DEGs. DEG = differentially expressed gene, KEGG = Kyoto Encyclopaedia of Genes and Genomes.
KEGG pathway analysis of DEGs. DEG = differentially expressed gene, KEGG = Kyoto Encyclopaedia of Genes and Genomes.
3.3. Gene Set Enrichment Analysis
We further subjected the LAM samples to Gene Set Enrichment Analysis (GSEA), and we found that these genes were mainly enriched in leukocyte endothelial cell migration, vascular endothelial growth factor (VEGF) signaling pathway, and T-cell receptor signaling pathway (Fig. 6).
Figure 6.
GSEA plots showing the most abundant set of all LAM genes in the GSE12027 and GSE160618 datasets. (1) VEGF signaling pathway, (2) leukocyte endothelial cell migration, (3) T-cell receptor signaling pathway. GSEA = Gene Set Enrichment Analysis, KEGG = Kyoto Encyclopaedia of Genes and Genomes, LAM = lymphangioleiomyomatosis, VEGF = vascular endothelial growth factor.
GSEA plots showing the most abundant set of all LAM genes in the GSE12027 and GSE160618 datasets. (1) VEGF signaling pathway, (2) leukocyte endothelial cell migration, (3) T-cell receptor signaling pathway. GSEA = Gene Set Enrichment Analysis, KEGG = Kyoto Encyclopaedia of Genes and Genomes, LAM = lymphangioleiomyomatosis, VEGF = vascular endothelial growth factor.
3.4. PPI network analysis and hub gene selection
We used STRING to construct a PPI network with 91 differential genes, of which there were 36 upregulated genes and 55 downregulated genes, which showed 88 nodes and 76 edges. Then we used Cytoscape for further analysis and we found 10 hub nodes, which were IGF1, HGF, SERPINE1, CXCL12, IGFBP3, LOX, PLAU, BMP2, IGFBP5, and DPP4 (Fig. 7).
Figure 7.
PPI networks. (A) PPI network of DEG, red represents upregulated genes and blue represents downregulated genes (B) subnetwork of the top 10 central genes from the PPI network. Red indicates a high degree of connectivity, yellow indicates a low degree of connectivity. DEG = differentially expressed gene, PPI = protein–protein interaction.
PPI networks. (A) PPI network of DEG, red represents upregulated genes and blue represents downregulated genes (B) subnetwork of the top 10 central genes from the PPI network. Red indicates a high degree of connectivity, yellow indicates a low degree of connectivity. DEG = differentially expressed gene, PPI = protein–protein interaction.Among these 10 hub gene, there are 8 upregulated genes and 2 downregulated genes.
3.5. Reanalysis of the hub gene
We continued the enrichment analysis of hub genes and found that hub genes were mainly enriched in receptor ligand activity, signaling receptor activator activity, regulation of insulin-like, regulation of Saloon athay cell migration, muscle cell migration, and receptor signaling pathway (Fig. 8).
Figure 8.
GO enrichment analysis of hub genes. BP = biological process, CC = cell composition, GO = Gene Ontology, MF = molecular function.
GO enrichment analysis of hub genes. BP = biological process, CC = cell composition, GO = Gene Ontology, MF = molecular function.
3.6. DO analysis
We subjected hub gene to DO analysis and found that these genes were associated with prostate cancer, male reproductive organ cancer, renal cell carcinoma, non–small cell lung carcinoma, kidney cancer, retinal vascular disease, diabetic retinopathy, and pneumonia (Fig. 9).
Figure 9.
DO analysis of hub genes. Bubble plots and bar graphs showing the expression of hub gene in the different diseases involved. DO = Disease Ontology.
DO analysis of hub genes. Bubble plots and bar graphs showing the expression of hub gene in the different diseases involved. DO = Disease Ontology.
4. Discussion
LAM can involve the lungs, mediastinum, and retroperitoneal lymphatics, with the lungs being the most predominantly involved organ. PLAM occurs mostly in women of childbearing age, with studies showing 3 to 8 cases per million women. The current gold standard for the diagnosis of PLAM is lung tissue biopsy, but lung tissue biopsy is invasive and not suitable for large-scale implementation.In this study, we mined 2 datasets, GSE12027 and GSE160618, from the GEO database, and bioinformatically analyzed the sequencing results of the whole RNA of lung LAM tissue and normal lung tissue. Ninety-one DEGs were identified, including 36 upregulated genes and 55 downregulated genes.Enrichment analysis of these DEGs revealed that DEGs were mainly enriched in tumor-related signaling pathways and inflammatory responses.Through the PPI network, we filtered out Hub gene, IGF1, HGF, SERPINE1, CXCL12, IGFBP3, LOX, PLAU, BMP2, IGFBP5, and DPP4.
4.1. Low-grade destructive metastatic tumors
In 2012, it was suggested that PLAM is a low-grade destructive metastatic tumor. 2015 World Health Organization lung tumor classification recognized this concept and classified it as a perivascular epithelioid cell tumor.[In our study, GO enrichment analysis showed that differential genes are mainly associated with epithelial cell proliferation, microenvironment of tumors, produce extracellular matrix. In MF, these differential genes play an important role in tumor signaling. This also proves that LAM is a tumor disease. According to the KEGG results, the RAS signaling pathway and the enrichment of choline metabolism in cancer also suggest that LAM also has a malignant character. GSEA showed that the gene was enriched in the VEGF pathway. Currently, there is clear evidence that VEGF-D can be used as diagnostic evidence for LAM.[ It was found that serum VEGF-D levels were significantly higher in PLAM patients compared to healthy controls.[. The American Thoracic Society/Japanese Respiratory Society recommends VEGF-D ≥ 800 pg/mL as as the threshold value for diagnosis.[ In the hub gene, HGF promotes tumor cell motility, invasion, and metastasis and mediates tumor–mesenchymal interactions. Elevated expression of IGF1 and its receptors showed a correlation with differentiation, proliferation, and prognosis of malignant tumors. IGFBP3 levels are negatively correlated with cancer development. This reciprocal suppression of genes may be responsible for LAM being low-grade destructive metastatic tumors. SERPINE1 is a key regulator of urokinase and tissue-type plasminogen activator. The urokinase-type plasminogen activator system plays a major role in tumor growth, angiogenesis, tumor cell invasion, migration, and metastasis.[ High expression of SERPINE1 increases the risk of metastasis.
4.2. Inflammatory response in LAM
We went on to perform GSEA of the samples and found that in LAM disease, the disease is mainly enriched in leukocyte endothelial cell migration, T-cell receptor signaling pathway, also confirming that inflammatory responses play an important role in LAM disease.Existing studies have shown that the process of leukocyte migration involves: leucocyte margination, capture, rolling, activation, adhesion, and migration.[ The adhesion molecules and other chemokines involved in this process act in a sequential manner.[Adhesion molecules are important molecules that mediate communication between extracellular matrix and cells,and are glycoproteins that regulate cell–cell and cell–extracellular matrix interactions and act as adhesions. In the hub gene, CXCL12 has critical role in the inflammatory response. IL-17A promotes the activation and conversion of lung fibroblasts into myofibroblasts, which secrete significantly increased type I collagen and promote the deposition of extracellular matrix, leading to the development and progression of pulmonary fibrosis, and the chemokine CXCL12 secreted by activated fibroblasts may be involved in this process.[The process of leukocyte migration exacerbates the inflammatory response in the lungs, which may contribute to the large amount of pleural fluid produced by PLAM.
5. Conclusion
PLAMis a systemic disease with a severe inflammatory response as well as borderline tumor. IGF1, IGFBP3, SERPINE1, and CXCL12 are potential key genes for inflammation and tumor invasion in PLAM, which point the way to further research on PLAM.
Acknowledgments
We would like to thank all the staff in Department of Lymphatic Surgery, Beijing Shijitan Hospital of Capital Medical University for their contribution on our research.
Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker Journal: Genome Res Date: 2003-11 Impact factor: 9.043
Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205
Authors: Francis X McCormack; Nishant Gupta; Geraldine R Finlay; Lisa R Young; Angelo M Taveira-DaSilva; Connie G Glasgow; Wendy K Steagall; Simon R Johnson; Steven A Sahn; Jay H Ryu; Charlie Strange; Kuniaki Seyama; Eugene J Sullivan; Robert M Kotloff; Gregory P Downey; Jeffrey T Chapman; MeiLan K Han; Jeanine M D'Armiento; Yoshikazu Inoue; Elizabeth P Henske; John J Bissler; Thomas V Colby; Brent W Kinder; Kathryn A Wikenheiser-Brokamp; Kevin K Brown; Jean F Cordier; Cristopher Meyer; Vincent Cottin; Jan L Brozek; Karen Smith; Kevin C Wilson; Joel Moss Journal: Am J Respir Crit Care Med Date: 2016-09-15 Impact factor: 21.405
Authors: Riffat Meraj; Kathryn A Wikenheiser-Brokamp; Lisa R Young; Francis X McCormack Journal: Semin Respir Crit Care Med Date: 2012-09-21 Impact factor: 3.119