Zulei Li1,2, Kai Zhao2, Hui Tian1. 1. Department of Thoracic Surgery, Qilu Hospital of Shandong University, Jinan, Shandong 250012, P.R. China. 2. Department of Thoracic Surgery, Zibo Central Hospital, Zibo, Shandong 255036, P.R. China.
Abstract
Non-small cell lung cancer (NSCLC) is the most common type of lung cancer, with high morbidity and mortality rates. Numerous diagnosis and treatment methods have been proposed, and the prognosis of NSCLC has improved to a certain extent. However, the mechanisms of NSCLC remain largely unknown, and additional studies are required. In the present study, the RNA sequencing dataset of NSCLC was downloaded from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). The clean reads obtained from the raw data were mapped to the University of California Santa Cruz human genome (hg19), based on TopHat, and were assembled into transcripts via Cufflink. The differential expression (DE) and differential alternative splicing (DAS) genes were screened out through Cuffdiff and rMATS, respectively. The significantly enriched gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes pathways were obtained through the Database of Annotation, Visualization and Integrated Discovery (DAVID). Different numbers of DE and DAS genes were identified in different types of NSCLC samples, but a number of common functions and pathways were obtained, including biological processes associated with abnormal immune and cell activity. GO terms and pathways associated with substance metabolism, including the insulin signaling pathway and oxidative phosphorylation, were enriched in DAS genes rather than DE genes. Integrated analysis of differential expression and alternative splicing may be helpful in understanding the mechanisms of NSCLC, in addition to its early diagnosis and treatment.
Non-small cell lung cancer (NSCLC) is the most common type of lung cancer, with high morbidity and mortality rates. Numerous diagnosis and treatment methods have been proposed, and the prognosis of NSCLC has improved to a certain extent. However, the mechanisms of NSCLC remain largely unknown, and additional studies are required. In the present study, the RNA sequencing dataset of NSCLC was downloaded from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). The clean reads obtained from the raw data were mapped to the University of California Santa Cruz human genome (hg19), based on TopHat, and were assembled into transcripts via Cufflink. The differential expression (DE) and differential alternative splicing (DAS) genes were screened out through Cuffdiff and rMATS, respectively. The significantly enriched gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes pathways were obtained through the Database of Annotation, Visualization and Integrated Discovery (DAVID). Different numbers of DE and DAS genes were identified in different types of NSCLC samples, but a number of common functions and pathways were obtained, including biological processes associated with abnormal immune and cell activity. GO terms and pathways associated with substance metabolism, including the insulin signaling pathway and oxidative phosphorylation, were enriched in DAS genes rather than DE genes. Integrated analysis of differential expression and alternative splicing may be helpful in understanding the mechanisms of NSCLC, in addition to its early diagnosis and treatment.
Entities:
Keywords:
RNA-sequencing; differential alternative splicing; differential expression; gene expression omnibus; non-small cell lung cancer
Lung cancer is the leading cause of cancer-associated mortality (accounting for ~13% of all diagnosed cancers), with ~1.8 million new cases in 2012 (1). Between 2011 and 2013, the lung cancer mortality rate increased by 464.84% in China, and this continued to increase (2). Non-small cell lung cancer (NSCLC) constitutes almost 85% of all lung cancers and is usually diagnosed an advanced stage (3). Surgery followed by chemoradiotherapy is the most commonly adopted method for locally advanced NSCLC, but the efficacy is diverse across different patients (4). In previous years, immunotherapy became an optional therapeutic for numerous types of cancers, including melanoma and lung cancer, through sustained activation of the immune system (5,6). Certain immune checkpoint inhibitors have been shown to significantly improve the prognosis of a number of cancers, including cytotoxic T-lymphocyte antigen 4, and nivolumab may prolong the median survival time of patients with melanoma and NSCLC by 2–4 and 6–9 months, respectively (7). However, the morbidity and mortality rate of NSCLC remains high, and additional molecular biology studies are required for more precise treatment and earlier diagnosis.Next generation RNA sequencing (RNA-Seq) is becoming a popular technology for the quantification of RNA, including mRNA (such as microRNA) and noncoding RNA (such as long noncoding RNA). Variation of genomic structure, including alternative splicing (AS) and single nucleotide polymorphisms, can also be detected by RNA-Seq. Compared with gene microarray, the most evident advantage of RNA-Seq is its ability to detect the expression values of low abundance as well as novel transcripts (8). Previous studies of NSCLC, based on RNA-Seq have been conducted and successfully identified numerous potential targets, involved in the progression or suppression of cancer. Riccardo et al identified the adenosine A3 receptor as a valuable target for NSCLC, based on the combination of gene fusion and differential expression analysis through RNA-Seq (9). In the previous study by Han et al, via RNA-Seq analysis, chromobox 3 and cellular retinoic acid-binding protein 2 were also considered to contribute to the progression of NSCLC (10).Compared with differential expression analysis at the gene level, exon level analysis may improve analysis of disease stages (11). The present study aimed to identify the potential targets of NSCLC, through the combined analysis of differential expression and AS, and based on the RNA-Seq dataset from the Gene Expression Omnibus (GEO). This may be helpful for the precise treatment of NSCLC, as well as early diagnosis.
Materials and methods
RNA-Seq dataset
The RNA-Seq dataset (GSE68795) in the present study was obtained from GEO (http://www.ncbi.nlm.nih.gov/geo/), which was accumulated by Durrans et al (12). A total of 17 samples were included, containing three immature monocytic myeloid cell (IMMC) samples, two epithelial cell (Epi) samples and two neutrophil (Neu) samples from lung cancerpatients, as well as three IMMC samples, three Epi samples and four Neu samples from adjacent normal lung tissues. Illumina HiSeq 2000 (Illumina, Inc., San Diego, CA, USA) was used for the sequencing process. Briefly, total RNA was extracted from flow cytometry sorted cells; TruSeq RNA Sample Preparation kit (Illumina, Inc.) was used for the preparation of cDNA libraries from 15–35 ng RNA; cDNA libraries that passed size and purity check were retained for the following sequencing. Single-end 51 bp short sequences (reads) were generated for the IMMC and Neu samples, while paired-end 102 bp reads were generated for the Epi samples, in lung cancer and adjacent normal lung tissues.
Reads mapping and assembling
Quality control of raw reads was conducted using FastQC software version 1.3, which was developed by Andrews (13), with the default parameters, i.e., reads with a quality score <10 and N >5% were discarded. The remaining reads were mapped to the UCSC genome (version GRCh37/hg19) through TopHat (2.1.0.Linux_x86_64) (14), a fast splice junction mapper for RNA-Seq reads, with no more than 2 mismatches in 25 bp segments. Cufflinks (14) (2.2.1.Linux_x86_64) was used for the assembly of the mapped reads, which allowed for the identification of novel transcripts, and fragments per kilobase of exon per million fragments mapped (FPKM) representation of gene expression value was obtained.
Differential expression analysis
Cuffdiff of Cufflinks was used to test the significance of differential expression of genes based on FPKM. Genes with fold change (FC) >2 (upregulated) or <0.5 (downregulated), and false discovery rate (FDR) adjusted for P<0.05, were considered to be differentially expressed.
Differential alternative splicing analysis
The replicate multivariate analysis of transcript splicing (rMATS) (15), developed by Shen et al, was used to screen differential alternative splicing genes across samples. The mapping results (in bam format) were submitted to rMATS. Annotation of genes (in GTF format) was obtained from the UCSC and used for the screening of known splicing sites. Finally, five main alternative splicing types, including skipping exon (SE), retention intron (RI), alternative 5′splice site (A5SS), alternative 3′splice site (A3SS) and mutually exclusive exons (MXE), which satisfied the criteria of FDR <0.1, were screened out as DAS genes.
Functional enrichment analysis
The DE and DAS genes were submitted to the Database for Annotation, Visualization and Integrated Discovery (DAVID; https://david.ncifcrf.gov/) (16) for the analysis of enrichment of gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. P<0.05 was considered to indicate a statistically significant difference for the screening of significant GO terms and pathways.
Results
RNA-Seq landscape
The average number of reads obtained from the RNA-Seq dataset was 60242792, and at least 35 million reads were prepared for every sample, as shown in Table I. The clean reads were mapped to the human genome (hg19) by TopHat and 86.50–91.90% unique mapped rates were obtained across all samples. These were used for the following analysis.
Table I.
Mapping overview of all the RNA-Sequencing experiments.
The unique mapped reads were assembled into transcripts by cufflinks, and FPKM values were obtained to quantify the mRNA expression level. Based on the criteria of FC >2 or <0.5, and FDR adjusted P<0.05, umerous genes were revealed to be differential expression in tumor samples compared with normal samples.In Epi tumor samples, a total of 1,624 DE genes were identified, which contained 915 downregulated and 709 upregulated genes. However, in IMMCtumor samples, 1,593 DE genes (639 downregulated and 954 upregulated genes) were obtained. In Neu tumor samples, 1,672 DE genes were obtained, which consisted of 1,141 downregulated and 1,672 upregulated genes. Furthermore, as shown in Fig. 1, 111 overlapping genes were identified among those three groups of DE genes, and 91 of those overlaps were consistently upregulated or downregulated across all three types of NSCLC tumor cells.
Figure 1.
Number of differential expression genes of Epi, IMMC and Neu tumor samples compared with the corresponding normal samples. Epi, epithelial cell; IMMC, immature monocytic myeloid cell; Neu, neutrophil.
AS in NSCLC tumor samples
Through AS, a mRNA precursor may result in multiple mature mRNAs with different functions. In the present study, the significant DAS genes were identified for the tumor samples compared with the normal ones through rMATS, with the threshold of FDR <0.1. Notably, alternative splicing of genes in Neu tumor samples was not found to be significantly different compared with Neu normal samples. A total of 2,130 and 1,713 DAS genes were identified in Epi and IMMCtumor samples, respectively. In total, 272 overlapping genes were screened out between the two lists of DAS genes.A total of 5 types of AS events were identified, including A3SS, A5SS, MXE, RI and SE. As shown in Fig. 2, SE is the most prevalent AS event in Epi and IMMCtumor samples, and accounts for 84.00 and 79.28% of all the AS events in Epi and IMMC, respectively. RI is the least prevalent AS event, which only accounts for 1.64 and 2.80% of all the AS events in Epi and IMMC, respectively.
Figure 2.
Distribution of different alternative splicing events in Epi and IMMC tumor samples. Epi, epithelial cell; IMMC, immature monocytic myeloid cell; MXE, mutually exclusive exons; RI, retention intron; SE, skipping exon; A3SS, alternative 3′splice site; A5SS, alternative 5′splice site.
GO and pathway analysis
Functional enrichment analysis based on the DAVID identified a number of significantly enriched GO terms and KEGG pathways in DE and DAS genes. The top five most significant GO terms of DE genes of the three groups of NSCLC tumor samples are listed in Table II. As shown in Table II, biological processes associated with the cell and immune system were significantly enriched (P<0.05) in the DE genes of all the three types of NSCLC tumor samples. Enriched pathways of DE genes of Epi, IMMC and Neu tumor samples are listed in Tables III, IV and V, respectively. Similar to the enriched GO terms, those pathways were mainly involved in the dysregulation of cell division and abnormal activity of the immune and inflammatory systems.
Table II.
Top five most significantly enriched GO terms of differential expression genes of the three groups of non-small cell lung cancer samples.
Kyoto Encyclopedia of Genes and Genomes pathways of differential expression genes in neutrophil tumor samples.
Pathway name
Gene number
P-value
Focal adhesion
61
4.81×10−7
Regulation of actin cytoskeleton
62
2.59×10−6
Fc gamma R-mediated phagocytosis
34
5.94×10−6
Chemokine signaling pathway
53
2.67×10−5
Leukocyte transendothelial migration
37
5.92×10−5
ECM-receptor interaction
29
7.03×10−5
Axon guidance
37
4.28×10−4
Pathogenic Escherichia coli infection
20
1.01×10−3
Hypertrophic cardiomyopathy
26
1.40×10−3
Adherens junction
24
1.71×10−3
Pathways in cancer
73
2.62×10−3
Dilated cardiomyopathy
26
4.57×10−3
Valine, leucine and isoleucine degradation
15
7.22×10−3
Glioma
19
8.70×10−3
Natural killer cell mediated cytotoxicity
32
0.018518
Non-small cell lung cancer
16
2.06×10−2
Gap junction
23
2.30×10−2
Aldosterone-regulated sodium reabsorption
13
2.45×10−2
Endocytosis
41
2.55×10−2
MAPK signaling pathway
56
2.78×10−2
Progesterone-mediated oocyte maturation
22
2.95×10−2
Arrhythmogenic right ventricular cardiomyopathy
20
2.97×10−2
ErbB signaling pathway
22
3.32×10−2
Fc epsilon RI signaling pathway
20
3.81×10−2
p53 signaling pathway
18
3.84×10−2
Colorectal cancer
21
4.23×10−2
Cell adhesion molecules
30
4.58×10−2
Glycerolipid metabolism
13
4.84×10−2
B cell receptor signaling pathway
19
4.89×10−2
ECM, extracellular matrix; MAPK, mitogen-activated protein kinase.
For DAS genes in Epi and IMMCtumor samples, their enriched GO terms are shown in Fig. 3, which was obtained by submitting the DAVID result to the WEGO website (http://wego.genomics.org.cn/cgi-bin/wego/index.pl), and their enriched KEGG pathways are shown in Fig. 4. As expected, those DAS genes were also involved in cell and immune abnormality. The insulin signaling pathway and oxidative phosphorylation, which were associated with substance metabolism and were not revealed by DE genes, were also found to be significantly enriched (P<0.05) in Epi and IMMCtumor samples, respectively. DE and DAS genes may be complementary in the understanding of mechanisms of NSCLC.
Figure 3.
Significantly enriched gene ontology terms of alternative splicing genes in (A) epithelial cell and (B) immature monocytic myeloid cell tumor samples. BP, biological process; MF, molecular function; CC, cellular component.
Figure 4.
Significantly enriched Kyoto Encyclopedia of Genes and Genomes pathways of alternative splicing genes in (A) epithelial cell and (B) immature monocytic myeloid cell tumor samples. PPAR, peroxisome proliferator-activated receptor; IgA, immunoglobulin A; Jak-STAT, Janus kinase-signal transducers and activators of transcription; NOD, non-obese diabetic; TCA, tricarboxylic acid.
Discussion
Lung cancer remains the leading cause of cancer-associated mortality in recent years, and its main type is NSCLC (17). An increasing number of studies about NSCLC have been emerging and numerous potential biomarkers for the early diagnosis or treatment of NSCLC have been identified subsequent to the development of molecular biology technologies. However, the mechanisms of NSCLC remain largely unknown, and additional studies are required to improve its prognosis. In the present study, through re-analysis of the RNA-Seq dataset from the GEO database, the significant DE and DAS genes were screened out in three types of NSCLC tumor samples (Epi, IMMC and Neu), compared with the corresponding normal samples, and their enriched GO terms and KEGG pathways were obtained. The present study aimed to provide results to help with the development of therapeutic methods and drugs for NSCLC.The rapid growth of RNA-Seq provides the chance to observe variation in the genome at the exon level, and screening of alternative splicing is one of its important applications (18). AS is an important mechanism that could result in proteins with different functions from one mRNA precursor, and it has been shown to affect the status of numerous diseases. AS events involving exons 15–19 of d'Origine nantais tyrosine kinase have been screened out in lung cancer (19). In addition, the T cell factor 4K AS isoform was found to improve the prognosis of NSCLC by suppressing the proliferation and metastasis of tumor cells (20). In the present study, significant alternative splicing was not observed in any genes in the Neu tumor samples compared with the normal samples. Numerous AS genes were identified in Epi and IMMCtumor samples, which may indicate the differences of those three types of cells in the inducement of NSCLC.In accordance with previous studies, SE is the most prevalent AS event, while RI is the least prevalent AS event in Epi and IMMCtumor samples (21,22). A total of 272 genes demonstrated alternative splicing in Epi and IMMCtumor samples, and this may be an indication of the important roles of those genes in the progression of NSCLC. Compared with AS analysis, numerous DE genes were identified in all three types of tumor samples, and 111 overlapping genes were screened out. One of the significant differences between the DE genes and DAS genes is the involvement of substance metabolism-associated pathways of DAS genes, including the insulin signaling pathway and oxidative phosphorylation. Insulin-like growth factor (IGF-1R) performs important roles through its downstream signaling in a number of solid tumors, including NSCLC (23). Furthermore, IGF-1R affects cisplatin and radiation resistance in lung cancer (24). In addition, oxidative stress/activity is affected by IGF-1R in numerous diseases, resulting in the dysregulation of biosystems, such as in cancer or neurodegeneration (25,26). DE and DAS analysis revealed a number of common GO terms and pathways, and the primary terms and pathways were cell or immune-associated. These have been investigated in numerous studies and were confirmed to play important roles in the development of numerous types of cancers, including lung cancer, gastric cancer and liver cancer (27–29).Based on the integrated analysis of DE and DAS, alternative splicing was identified by leucine-rich repeat kinase 2 (LRRK2) in Epi and IMMCtumor samples, and was consistently downregulated in all three types of tumor cells. LRRK2 is a member of the leucine-rich repeat kinase family and encodes a protein with an ankryin repeat region. LRRK2 has been shown to play important roles in neurodegeneration diseases, including Parkinson's disease (30–32). The role of LRRK2 in cancer development remains unclear; however, certain indirect associations were identified (33). For example, through disrupting the activation of the proto-oncogene MET, LRRK2 may prevent tumor cell proliferation (34). Therefore, it may be a novel target for NSCLC.In conclusion, through the combined analysis of DE and DAS, potential targets for NSCLC were screened out. Furthermore, DE and DAS may be complementary for understanding the mechanisms of NSCLC. The present study may be helpful in the diagnosis and treatment of NSCLC.
Authors: Sara Pilotto; Miguel Angel Molina-Vila; Niki Karachaliou; Luisa Carbognin; Santiago Viteri; Maria González-Cao; Emilio Bria; Giampaolo Tortora; Rafael Rosell Journal: Transl Lung Cancer Res Date: 2015-12
Authors: Brendan D Looyenga; Kyle A Furge; Karl J Dykema; Julie Koeman; Pamela J Swiatek; Thomas J Giordano; Andrew B West; James H Resau; Bin T Teh; Jeffrey P MacKeigan Journal: Proc Natl Acad Sci U S A Date: 2011-01-10 Impact factor: 11.205
Authors: Shihao Shen; Juw Won Park; Zhi-xiang Lu; Lan Lin; Michael D Henry; Ying Nian Wu; Qing Zhou; Yi Xing Journal: Proc Natl Acad Sci U S A Date: 2014-12-05 Impact factor: 11.205
Authors: Anna Durrans; Dingcheng Gao; Ravi Gupta; Kari R Fischer; Hyejin Choi; Tina El Rayes; Seongho Ryu; Abu Nasar; Cathy F Spinelli; Weston Andrews; Olivier Elemento; Daniel Nolan; Brendon Stiles; Shahin Rafii; Navneet Narula; Ramana Davuluri; Nasser K Altorki; Vivek Mittal Journal: PLoS One Date: 2015-06-05 Impact factor: 3.240
Authors: Sean F Monaghan; Debasree Banerjee; Chun-Shiang Chung; Joanne Lomas-Neira; Kamil J Cygan; Christy L Rhine; William G Fairbrother; Daithi S Heffernan; Mitchell M Levy; William G Cioffi; Alfred Ayala Journal: Mol Med Date: 2018-06-18 Impact factor: 6.354