Literature DB >> 31908488

RNA-Seq Analysis Identified XLOC_009190 as Potential Therapeutic Target for Lung Adenocarcinoma.

Jing Wang1, Ning Wang1, Wei-Dong Zhao2, Li-Xian Zhao2, Yong-Guang Jing3, Li-Jie Yang2, Jie He2, Jun Li1.   

Abstract

BACKGROUND: The abnormal regulation on the expression of lncRNAs had been linked to multiple kinds of cancers, including lung adenocarcinoma.
METHODS: In this study, we carried out RNA-Seq on the three tumors and their paired normal samples from Chinese patients with lung adenocarcinoma. All the transcripts were de novo assembled, among which all the possible lncRNAs were predicted by tools including PLEK, CNCI, CPC, Blastp, hmmscan, and so forth. Their expression levels, altogether with the annotated mRNAs, were quantified. The weighted correlation network analysis and analysis of differential expression were carried out to explain the biological function of these novel lncRNAs.
RESULTS: The weighted correlation network analysis showed that the lncRNAs, which were highly correlated with protein-coding genes, participated in various pathways, including PI3K kinase pathways. These lncRNAs were important regulators in biological processes. Next, the differentially expressed lncRNAs were identified, including four known lncRNAs and one novel lncRNA (XLOC_009190). The cis-regulation of this novel lncRNA might act on MGST1, which protected cells by conjugation and glutathione peroxidase functions. The trans-regulation of this lncRNA was investigated by its correlated mRNAs. The results showed that it possibly played a role in transmembrane receptors like G protein-coupled receptors and potassium channels.
CONCLUSION: We proposed the potential biological function of XLOC_009190, but further experiments are needed to elucidate its roles and its potential to be the therapeutic target.
© 2019 Wang et al.

Entities:  

Keywords:  RNA-Seq; de novo assembly; lncRNAs; lung adenocarcinoma

Year:  2019        PMID: 31908488      PMCID: PMC6927261          DOI: 10.2147/OTT.S225532

Source DB:  PubMed          Journal:  Onco Targets Ther        ISSN: 1178-6930            Impact factor:   4.147


Introduction

Approximately 20,000 protein-coding genes accounted for less than two percent of the human genome.5 However, it was regarded that over seventy percent of the genome can be transcribed into RNA, most of which are non-coding RNAs.19 Long non-coding RNAs (lncRNAs), defined as non-coding RNAs with over 200 nucleotides, are the most critical non-coding RNAs. In the past, lncRNAs were considered as valueless noises during transcription, while they are proved to be essential regulators in the cell and also new targets in the disease research.23 The abnormal regulation on the expression of lncRNAs had been linked to multiple kinds of cancers. For example, MALAT1, which is expressed abundantly in the nucleus, was shown to participate in the metastasis and recurrence by regulating the biological activity of the E2F1 transcription factor.30 Tano et al26 reported that the depletion of MALAT1 would reduce tumorigenicity. Another lncRNA, lincRNA-p21, served as a repressor in p53-dependent transcriptional responses through the physical binding with hnRNP-K.8 H19, which is an estrogen-inducible lncRNA, was reported to be involved in cell survival and estrogen-induced cell proliferation.25 Actually, evidence indicating the functions of lncRNAs in the initiation and progression of lung adenocarcinoma are emerging. HOTAIR was highly expressed in lung cancer and it targeted chromatin repressor polycomb proteins, which resulted in increased cancer invasiveness and metastasis.18 Yan-Rong et al33 reported that the expression of lncRNA PVT1 was elevated in lung cancer cells and the increased expression was associated with the progression and metastasis. MEG3 was suppressed in lung cancer cells and the re-constitution of MEG3 led to a decrease in cell proliferation and increased apoptosis.11 It was estimated that lung adenocarcinoma led to 150,000 deaths in the US in 2018, which makes it one of the deadliest tumors in the world. When lung cancer is diagnosed, it is usually in locally advanced status or metastasis. As a result, the 5-year survival rates of patients with lung adenocarcinoma are lower, which was estimated to be approximately 18%, compared with other primary cancers. As a result, more studies were needed to understand the progression and prognosis of lung adenocarcinoma in order to help the targeted treatment. Moreover, lncRNAs were usually neglected in a lot of studies. In this study, we collected three tumors and three paired normal samples from Chinese patients with lung adenocarcinoma, and RNA-Seq was carried out. We aimed to figure out the known, as well as novel, lncRNAs that participated in the initiation and progression of lung adenocarcinoma. Our results can provide some insights into the biological causes of lncRNAs on lung cancer and also new targets in lung cancer research.

Methods

Sample Collection, RNA Isolation, and NGS Sequencing

The tumor and paired normal samples were collected from three lung adenocarcinoma patients with written informed consent. Total RNA was prepared by the PureLink Total RNA purification kit (Invitrogen, USA). The quality and integrity of RNA were accessed by the Qubit3.0 Fluorometer (Life Technologies, USA) and Agilent 2100 RNA Nano 6000 Assay Kit (Agilent Technologies, USA). Three micrograms of RNAs per sample were used for the following steps. The rRNA in the library were removed by Ribo-Zero GoldKits (Illumina, USA). Twelve libraries for six samples with 2 replicates each were constructed using NEBNext Ultra DirectionalRNA Library Prep Kit for Illumina (NEB, USA) according to the product instruction. They were finally sequenced by Illumina HiSeq2500 for 150 bp pair-end sequencing.

Transcriptome Assembly and Quantification

Quality control was first executed for the RNA-Seq data from each sample by fastp.4 The RNA-Seq data, all passing the step of quality control, were then mapped to the human GRCh38 genome by using TopHat.28 The mapped reads then were de novo assembled using Cufflinks.29 The assembled transcripts were merged and quantified by Cuffmerge and Cuffnorm, respectively. On the other hand, the quantification of mRNAs was executed by RSEM with Gencode v29 as the annotation.15

Identification of Novel lncRNAs

First, the transcripts that owned only one exon were excluded. The sequences of assembled transcripts that were longer than 200bp were extracted from the merged GTF file by an internal python script. PLEK,14 which could predict lncRNAs or mRNAs based on a scheme of k-mer, and CNCI,16 which profiled adjoining nucleotide triplets, was utilized to distinguish lncRNAs from mRNAs. Only the transcripts that were predicted as non-coding transcripts by both tools were kept to the next step. The protein-coding potential of the remaining transcripts was assessed by CPC.13 The possible open read frames (ORF) were extracted from the transcripts using TransDecoder (). Any transcripts containing an ORF longer than 100 amino acids were excluded. The protein sequences of the longest ORFs for each left transcript were extracted. These proteins were searched against Pfam A by hmmscan6 and the protein database from NCBI by Blastp.17 All transcripts with E values lower than 1E-6 were removed because the lncRNAs should not be similar to any known proteins.

WGCNA Correlation Network Analysis

The expected count of the lncRNAs and mRNAs was combined together and was normalized by R package DESeq.1 Next, we extracted all the lncRNA–mRNA pairs with the absolute value of correlation over 0.9 in order to prevent a too large network. Weighted correlation network analysis (WGCNA) was carried out to find modules of highly correlated genes by R package WGCNA.12,34 The statistical overrepresentation tests and pathway annotation were carried out on the protein-coding genes in each module by PANTHER web server.20,21 Fisher’s exact test was carried out with FDR multiple test correction and 0.05 was set as the cutoff for significance.

Differential Expression Analysis of Tumor-Related lncRNAs

After normalization, the differentially expressed genes, including mRNAs and lncRNAs, were identified by DESeq.1 The p-value for lncRNAs was corrected using FDR, and the lncRNAs with FDR lower than 0.05 were regarded as tumor-related lncRNAs. We re-mapped these lncRNAs to the human genome with BLAT and decided whether these lncRNAs were already known or novel. For the novel lncRNA, its functional analysis contained two parts. First, considering the cis-regulation of lncRNA, the 100k bp upstream or downstream of the lncRNAs were scanned for its possible targets. Second, the mRNAs whose expression levels were highly correlated with this novel lncRNA were identified and Slimmer gene ontology enrichment27 was carried out by PANTHER web server.20

Results

Genome-Wide Identification of lncRNAs

Understanding the characteristics of lncRNAs in patients with lung adenocarcinoma would be valuable for exploring functional mechanisms and new therapeutic targets. To obtain a de novo assembled transcriptome of lncRNAs, we totally generated 12 RNA-Seq datasets from three patients. A step-wise protocol was applied to ensure a high-confident discovery of lncRNAs (Figure 1A).
Figure 1

The genome-wide identification of lncRNAs. (A) The flow chart for the identification and following analysis of lncRNAs. (B) The bar plots displayed the count of transcripts from each of the six samples and the addition of all samples that were filtered out in the four steps from the flow chart “Filter of Transcripts from lncRNAs”.

The genome-wide identification of lncRNAs. (A) The flow chart for the identification and following analysis of lncRNAs. (B) The bar plots displayed the count of transcripts from each of the six samples and the addition of all samples that were filtered out in the four steps from the flow chart “Filter of Transcripts from lncRNAs”. The libraries were first mapped to the reference genome and then the successfully mapped high-quality reads were used for transcriptome assembly. The de novo assembly produced 48,680 transcripts in total (Figure 1B). Totally, 27,161 transcripts with only one exon were removed considering the possible assembly error. Further analysis based on the length and the coding potential suggested that 1,891 of the transcripts were candidate lncRNAs. The proportions of transcripts filtered out in each step were similar in all the samples (Figure 1B).

Co-Expression Analysis Between lncRNAs and mRNAs

Preventing too large networks, the lncRNA-mRNA pairs were filtered by their correlation coefficients using a cutoff as 0.8. The lncRNA–mRNA pairs with a high correlation coefficient including 2372 mRNAs and 284 lncRNAs were retained for WGCNA. The WGCNA analysis identified 3 modules of highly correlated genes (Figure 2). The lncRNAs had related biological functions with the other mRNAs within the same modules. The genes belonging to the module colored as blue seemed to own a higher overlap.
Figure 2

The gene network for lncRNA-associated mRNAs by WGCNA. The heatmap shows the topological overlap matrix among the lncRNA-associated mRNAs. The light yellow represents low overlap and the red color represents higher overlap.

The gene network for lncRNA-associated mRNAs by WGCNA. The heatmap shows the topological overlap matrix among the lncRNA-associated mRNAs. The light yellow represents low overlap and the red color represents higher overlap. Statistical overrepresentation test hinted that the mRNAs in blue module were enriched in phosphoinositide 3-kinase (PI3 kinase) pathway (Table 1). On the other hand, the mRNAs in turquoise modules were associated with ubiquitin-proteasome pathway and platelet-derived growth factor (PDGF) signaling pathway. There was no pathway enriched for the mRNAs in the brown module. These results provided a basis for exploration of the lncRNA functions that some lncRNAs participated in the regulation of PI3K kinase, ubiquitin, and PDGF.
Table 1

The Pathway Enrichment of Protein-Coding Genes in Identified Modules

ModulePANTHER IDDescription of PathwayFold ChangeRaw P-ValueFDR
TurquoiseP00060Ubiquitin proteasome pathway4.122.03E-043.33E-02
P00047PDGF signaling pathway2.672.41E-041.98E-02
BlueP00048PI3 kinase pathway4.561.76E-042.89E-02
The Pathway Enrichment of Protein-Coding Genes in Identified Modules

Identification of Stress-Related lncRNAs

We performed differential expression analysis on the contrast of tumor and normal tissues. In total, five lncRNAs were significantly up-regulated in tumor samples (Figure 3). Among the five lncRNAs, four of them were previously annotated (Table 2), while XLOC_009190 was never described in the past. The mapping of four known lncRNAs on the genomic coordinates showed their similarity to these known lncRNAs (Figure 4). XLOC_009190 was located on chromosome 12.
Figure 3

The differentially expressed lncRNAs between normal and tumor. In total, five lncRNAs were significantly up-regulated in tumor samples.

Table 2

The Details of the Differentially Expressed lncRNAs Between Normal and Tumor

Assembled GeneIdentical to Known lncRNAExpression Levels
Gene SymbolEnsembl Gene IDMean (Normal)Mean (Tumor)Fold ChangeP-Value
XLOC_001000LINC01160ENSG000002313462.0327.6413.599.38E-03
XLOC_030637LINC02475ENSG000002513500.3611.5631.744.40E-02
XLOC_036188AC005100.1ENSG000002859600.6774.82111.173.04E-02
XLOC_040183BX890604.2ENSG00000285756106.42311.492.931.40E-02
XLOC_00919046.63134.112.883.26E-02
Figure 4

The mapping of four differentially expressed lncRNAs, whose sequences were similar to known lncRNAs, on the genomic coordinates.

The Details of the Differentially Expressed lncRNAs Between Normal and Tumor The differentially expressed lncRNAs between normal and tumor. In total, five lncRNAs were significantly up-regulated in tumor samples. The mapping of four differentially expressed lncRNAs, whose sequences were similar to known lncRNAs, on the genomic coordinates. In order to figure out the potential biological function of XLOC_009190, the protein-coding genes located within ± 100k bp were identified (Figure 5A). Only MGST1 was located upstream, which might serve as the target of XLOC_009190 via cis-regulation. On the other hand, the potential mRNAs via trans-regulation were obtained by the correlation of their expression levels. The enriched gene ontology of these mRNAs hinted that XLOC_009190 might act in the detection of stimulus and transmembrane signaling receptor activity (Figure 5B).
Figure 5

The biological function of novel lncRNA XLOC_009190. (A) The genomic coordinates showed that MGST1 was located between ±100k of XLOC_009190. (B) The GO term enrichment for the mRNAs whose expression levels were associated with XLOC_009190.

The biological function of novel lncRNA XLOC_009190. (A) The genomic coordinates showed that MGST1 was located between ±100k of XLOC_009190. (B) The GO term enrichment for the mRNAs whose expression levels were associated with XLOC_009190.

Discussion

In this study, we assembled 1,891 lncRNAs for three patients with lung adenocarcinoma and identified four known and one novel lncRNAs that played a role in cancer. Our results hinted that some lncRNAs were correlated closely with protein-coding genes, which could be categorized into three different modules. In the first module, the mRNAs and lncRNAs were significantly enriched in the pathway of PI3K kinase. Genes in the PI3K pathway are involved in cell survival, regulation of gene expression and cell metabolism, and cytoskeletal rearrangements.21 These genes are also known to be frequently altered in multiple cancers, including carcinomas of the lung. Because their high frequency of alterations include mutations and amplifications, which resulted in over-activation of certain upstream/downstream mediators, targeting genes in the PI3K signaling pathway can be a promising therapeutic approach in cancer treatment.32 The lncRNAs assignment into this module may be the potential targets as well. Protein-coding genes in another module were associated with ubiquitin-proteasome pathway and PDGF signaling pathway. Ubiquitin-proteasome pathway involved in the degradation by the proteasome machinery. Numerous studies have shown that this pathway was occasionally the targets of tumor-related deregulation and can underlie the oncogenic transformation, tumor progression, metastasis, and drug resistance.24 PDGF, whose receptors belong to the family of receptor protein tyrosine kinases, plays a critical role in cellular proliferation and development.21 The expression of PDGF is commonly found in different human tumors, including lung adenocarcinoma. Pietras, Pahler, Bergers, and Hanahan2 confirmed the ability of tumor cell–derived PDGFs to make a contribution to the recruitment of both fibroblastic and vascular tumor stroma. Through its effects on fibroblastic and vascular tumor stroma, PDGF signaling may directly or indirectly promote the growth of tumor, the metastasis, and drug resistance.2 These results provided a basis for exploration of the lncRNA functions that they participated in the tumor progression, positively or negatively, by affecting different pathways. Next, we identified five differentially expressed lncRNAs, which were up-regulated in the tumor samples. Four of them were known lncRNAs and one was novel. We tried to identify the potential biological function of XLOC_009190. First, it might act by cis-regulating its upstream gene MGST1. MGST1 encodes a protein that catalyzes the conjugation of glutathione and the reduction of lipid hydroperoxides. According to the Human Protein Altas, MGST1 can serve as a prognostic marker in thyroid cancer and renal cancer.31 Some specific polymorphisms of MGST1 were also regarded to increase the risk of multiple cancers.9,35 Furthermore, Johansson, Jarvliden, Gogvadze, and Morgenstern10 reported that MGST1 protected cells, as well as mitochondria, by conjugation and glutathione peroxidase functions. Second, the potential mRNAs regulated by XLOC_009190 were identified by their correlation with its expression levels. These correlated mRNAs were enriched in several Slimmer gene ontology terms. These terms included detection of stimulus, transmembrane signaling receptor activity, and other related molecular function. For example, G protein–coupled receptors are one of the largest signal-conveying receptor families, which mediate various physiological processes. Moreover, they are also involved in the biological mechanisms of carcinoma. Many evidence have linked G protein-coupled receptors and other genes in the downstream with cancer growth and development. In fact, G protein–coupled receptors controlled various features of tumorigenesis, including tumor immunity, proliferation, invasion, and metastasis.3 Simultaneously, potassium channels were only enriched. They are pore-forming transmembrane proteins controlling potassium flow across cell membranes. There was a study that hinted the antitumor efficacy of potassium channel–modulating agents.7 Potassium channels were able to regulate tumor proliferation and migration through both ion permeation–dependent or ion permeation–independent functions. Huang and Jan7 proposed that targeting potassium channel could be potential cancer therapeutics considering their cell surface localization and well-known pharmacology. As a result, the novel lncRNA XLOC_009190 affected the patients with lung adenocarcinoma by two possible pathways. It might act on MGST1 that protected cells by conjugation and glutathione peroxidase functions. Otherwise, it played a role in transmembrane receptors like G protein–coupled receptors and potassium channels. In summary, we de novo assembled the lncRNAs for the Chinese patients with lung adenocarcinoma, which were associated with PI3K kinase pathway and other tumor-related pathways. Moreover, we identified four known lncRNAs, as well as one novel lncRNAs (XLOC_009190), which were differentially expressed between tumor and normal. We proposed the potential biological function of these novel lncRNAs, but further experiments are needed to elucidate its roles.
  34 in total

1.  Discovery and annotation of long noncoding RNAs.

Authors:  John S Mattick; John L Rinn
Journal:  Nat Struct Mol Biol       Date:  2015-01       Impact factor: 15.369

2.  A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response.

Authors:  Maite Huarte; Mitchell Guttman; David Feldser; Manuel Garber; Magdalena J Koziol; Daniela Kenzelmann-Broz; Ahmad M Khalil; Or Zuk; Ido Amit; Michal Rabani; Laura D Attardi; Aviv Regev; Eric S Lander; Tyler Jacks; John L Rinn
Journal:  Cell       Date:  2010-08-06       Impact factor: 41.582

3.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Authors:  Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren; Steven L Salzberg; Barbara J Wold; Lior Pachter
Journal:  Nat Biotechnol       Date:  2010-05-02       Impact factor: 54.908

4.  PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements.

Authors:  Huaiyu Mi; Xiaosong Huang; Anushya Muruganujan; Haiming Tang; Caitlin Mills; Diane Kang; Paul D Thomas
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

5.  Expansion of the Gene Ontology knowledgebase and resources.

Authors: 
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

6.  fastp: an ultra-fast all-in-one FASTQ preprocessor.

Authors:  Shifu Chen; Yanqing Zhou; Yaru Chen; Jia Gu
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

7.  WGCNA: an R package for weighted correlation network analysis.

Authors:  Peter Langfelder; Steve Horvath
Journal:  BMC Bioinformatics       Date:  2008-12-29       Impact factor: 3.169

8.  Functions of paracrine PDGF signaling in the proangiogenic tumor stroma revealed by pharmacological targeting.

Authors:  Kristian Pietras; Jessica Pahler; Gabriele Bergers; Douglas Hanahan
Journal:  PLoS Med       Date:  2008-01-29       Impact factor: 11.069

9.  Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

Authors:  Iakes Ezkurdia; David Juan; Jose Manuel Rodriguez; Adam Frankish; Mark Diekhans; Jennifer Harrow; Jesus Vazquez; Alfonso Valencia; Michael L Tress
Journal:  Hum Mol Genet       Date:  2014-06-16       Impact factor: 6.150

Review 10.  G Protein-Coupled Receptors in Cancer.

Authors:  Rachel Bar-Shavit; Myriam Maoz; Arun Kancharla; Jeetendra Kumar Nag; Daniel Agranovich; Sorina Grisaru-Granovsky; Beatrice Uziely
Journal:  Int J Mol Sci       Date:  2016-08-12       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.