Literature DB >> 24416147

Transcriptome-wide analysis of UTRs in non-small cell lung cancer reveals cancer-related genes with SNV-induced changes on RNA secondary structure and miRNA target sites.

Radhakrishnan Sabarinathan1, Anne Wenzel1, Peter Novotny2, Xiaojia Tang3, Krishna R Kalari4, Jan Gorodkin1.   

Abstract

Traditional mutation assessment methods generally focus on predicting disruptive changes in protein-coding regions rather than non-coding regulatory regions like untranslated regions (UTRs) of mRNAs. The UTRs, however, are known to have many sequence and structural motifs that can regulate translational and transcriptional efficiency and stability of mRNAs through interaction with RNA-binding proteins and other non-coding RNAs like microRNAs (miRNAs). In a recent study, transcriptomes of tumor cells harboring mutant and wild-type KRAS (V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) genes in patients with non-small cell lung cancer (NSCLC) have been sequenced to identify single nucleotide variations (SNVs). About 40% of the total SNVs (73,717) identified were mapped to UTRs, but omitted in the previous analysis. To meet this obvious demand for analysis of the UTRs, we designed a comprehensive pipeline to predict the effect of SNVs on two major regulatory elements, secondary structure and miRNA target sites. Out of 29,290 SNVs in 6462 genes, we predict 472 SNVs (in 408 genes) affecting local RNA secondary structure, 490 SNVs (in 447 genes) affecting miRNA target sites and 48 that do both. Together these disruptive SNVs were present in 803 different genes, out of which 188 (23.4%) were previously known to be cancer-associated. Notably, this ratio is significantly higher (one-sided Fisher's exact test p-value = 0.032) than the ratio (20.8%) of known cancer-associated genes (n = 1347) in our initial data set (n = 6462). Network analysis shows that the genes harboring disruptive SNVs were involved in molecular mechanisms of cancer, and the signaling pathways of LPS-stimulated MAPK, IL-6, iNOS, EIF2 and mTOR. In conclusion, we have found hundreds of SNVs which are highly disruptive with respect to changes in the secondary structure and miRNA target sites within UTRs. These changes hold the potential to alter the expression of known cancer genes or genes linked to cancer-associated pathways.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24416147      PMCID: PMC3885406          DOI: 10.1371/journal.pone.0082699

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Next-generation genome sequencing is now widely used for the identification of genetic variations in cancer genomes [1], [2]. Non-small cell lung cancer (NSCLC) is the most common form of lung cancer and it is often found with activating mutations in the KRAS oncogene which causes the tumor cells to be aggressive and resistant to chemotherapy [3]–[5]. In a recent study, Kalari et al. [6] performed transcriptome-wide sequencing of NSCLC and identified differentially expressed genes, alternate splicing isoforms and single nucleotide variants (SNV) for tumors with and without KRAS mutations. A network analysis was performed with the genes showing differential expression (374 genes), alternate splicing (259 genes) and SNV-related changes (65 genes) that are differentially present in lung tumor groups with and without KRAS mutations. Integrated pathway analysis identified NFκB, ERK1/2 and AKT pathways as the most significant pathways differentially deregulated in KRAS wild-type as compared with KRAS mutated samples. A single nucleotide variant (SNV) is a nucleotide change at a single base position that occurs at a low frequency (also referred as a rare variant). SNVs observed in tumor cells are mostly somatic variants and very few are germ-line variants. Genome-wide association studies (GWAS) report that SNVs mostly occur in non-coding regions compared to coding (exonic) regions of RNAs [7]. In the past, however, most studies have been focused on the effect of SNVs in coding regions (known as cSNVs or nsSNVs) [8] rather than the effect of SNVs in the regulatory non-coding DNA or non-coding RNA (rSNV). In the case of NSCLC, Kalari et al. [6] identified a total of 73,717 unique SNVs present in and around (+/−5 kb) RefSeq genes. Of these, 23,987 were cSNVs and their effects on coding regions have previously been predicted (see [6] for more details). The effects of rSNVs that are located in untranslated regions (UTRs) of protein-coding genes, however, need to be analyzed. It is well known that UTRs play crucial roles in post-transcriptional regulation including mRNA stability [9], transport [10], localization [11], [12], translational activation [13] and repression [14], [15]. These functional regulations are carried out by cis-regulatory elements present in 5′ and 3′ UTRs. Notably, some of the cis-regulatory elements are structured, e.g., iron-responsive element (IRE), internal ribosome entry site (IRES) and selenocysteine insertion sequence (SECIS). The primary structure of cis-regulatory elements is also important for the binding of trans-acting RNA-binding proteins or other non-coding RNAs. For example, microRNAs (miRNAs) are small non-coding RNAs (about 22 nt) that bind to target sites mostly present in 3′ UTRs. This interaction results in either the cleavage of target mRNAs or repression of their translation. Several studies have reported that miRNA-mediated gene regulation plays a major role in cancer cells and such regulation has been considered as a potential drug target (see review [16]). All this evidence supports both sequence and structural motifs of UTRs being important for the control of gene expression. The occurrence of genetic variation(s) in UTRs could potentially affect their sequence and/or structural motifs and thus lead to changes in post-transcriptional regulation [17]–[20]. For example, a SNP in a let-7 miRNA target site (miRTS) in the 3′ UTR of KRAS has been identified to affect the binding of let-7 miRNAs. This results in the overexpression of KRAS, leading to the increased risk of NSCLC [21]. In addition, recent studies report that genetic variation can potentially create, change or destroy miRNA targets sites, which results in dysregulation of the target mRNA [22], [23]. Notably this has been identified in tumor cells as well [24]. Furthermore, a cancer-driven mutation present in an IRES in human p53 mRNA alters the structure of the IRES element, which inhibits binding of a trans-acting factor essential for translation [25]. The recent number of web servers and data bases developed to deal with variants affecting miRTSs also demonstrates the growing importance of target site variants [22], [26]–[28]. In this study, we predict the possible effects of 29,290 SNVs associated with NSCLC that are located in the UTR regions of mRNAs. The local effect of SNVs on the secondary structure of UTRs is predicted using RNAsnp [29] and the effect of SNVs on miRTSs in the UTR is predicted using TargetScan [30] and miRanda [31], which were shown to be among the more reliable miRNA target prediction methods [32]. The experimentally identified miRNA-mRNA maps, using Argonaute (Ago) cross-linking immunoprecipitation coupled with high-throughput sequencing (CLIP-Seq), are further used to reduce the false positive predictions of miRTSs [33].

Materials and Methods

Data sources

The SNVs identified through RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 with KRAS mutation and 7 without KRAS mutation) were extracted from Kalari et al. [6]. It should be noted that the previous study [6] had no RNA-sequencing data on normal cells so it was not possible to separate SNVs derived from the germ-line or somatic mutation. Thus, the data set obtained, 29,290 UTR SNVs in 6462 coding genes, derived from both germ-line and somatic variants that were expressed in lung adenocarcinoma tumors. Based on the overlap of these SNVs with the dbSNP (135 build), we could estimate that 40% of the 29,290 SNVs are germ-line variants. For those SNVs that overlap with dbSNP entries, we also extracted the SNPs in linkage dis-equilibrium using the SNAP server [34] (version 2.2; with the default parameters: r2≥0.8, distance limit 500 kb, SNP data set 1000 Genomes pilot 1, and population panel CEU). RefSeq mRNA sequences corresponding to the 6462 genes (hg19 Build) were downloaded from the UCSC genome browser (http://genome.ucsc.edu) [35]. For genes with multiple transcripts, all isoforms were considered. By mapping of 29,290 SNVs to these RefSeq mRNA sequences, we obtained 3646 in 5′ UTRs, 25,627 in 3′ UTRs and 17 in both 5′ and 3′ UTR of overlapping transcripts. These SNVs were further subjected to our comprehensive pipeline (Figure 1) to predict their effect on RNA secondary structure and miRNA target sites, which is described in the following sections.
Figure 1

Pipeline for the analysis of effect of SNVs on UTRs of mRNA.

A list of cancer-associated genes was obtained from COSMIC [36] and Qiagen/SABioSciences [37]. This list includes 1347 of the 6462 genes considered in this analysis. In order to find the enrichment of genes carrying disruptive SNVs, which have effect on secondary structure and/or miRTSs, in cancer, we performed a one-sided Fisher's exact test. This was computed from a 2×2 contingency table (n = 6462) with the number of genes carrying/not carrying disruptive SNVs on the one side and the number of cancer-associated/other genes on the other side. Similarly, the enrichment for disruptive SNVs in cancer-associated genes was computed by classifying the total number of SNVs (n = 29,290) into disruptive/non-disruptive SNVs on the one hand, and those being and not being present in cancer-associated genes on the other. A set of experimentally verified examples of SNPs with effects on miRNA target sites has been extracted from the literature (see Table S1). These 19 SNPs (affecting 25 miRNA-mRNA interactions) have been used to test the filtration criteria used in the miRNA part of our pipeline.

Prediction of SNVs' effect on RNA secondary structure

The effect of SNVs on RNA secondary structure was predicted using RNAsnp (version 1.1) [29]. The wild-type mRNA sequences and the SNVs were given as input along with default parameters of RNAsnp. For each SNV, RNAsnp considered a window of +/−200 nts around the SNV position to generate the wild-type (WT) and mutant (MT) subsequences and computed their respective base pair probability matrices and . Then, the difference between the base pair probability of wild-type and mutant structure was measured using Euclidean distance (d) and Pearson correlation coefficient (r) for all local regions within the subsequence. For completeness, we briefly summarize these two measures as follows. The first one computes the difference between two matrices directly bywhere is the probability of bases i and j being paired. The second measure uses the position-wise pair probabilities . For a local region , the vector contains the elements . Then the difference between two vectors and is measured by Finally, a local region predicted with maximum Euclidean distance (dmax) or minimum Pearson correlation coefficient (rmin) and the corresponding p-value is then reported. We employ both measures independently as both measures hold their respective strengths and weaknesses (see [29] for details). We generated two lists (each with p<0.1) of candidates, dmax and rmin, and each of them is subjected to a multiple-testing correction using the Benjamini-Hochberg procedure [38], which limits the false discovery rate to be no more than a chosen threshold (typically 10%). To analyze whether the RNAsnp predicted local region is structurally conserved, we used the annotations of conserved RNA secondary structure predictions from our in-house pipeline [39], which makes use of a range of tools including CMfinder [40] and RNAz [41] programs.

Predicting of SNVs' effect on microRNA target sites

For each SNV in the data set, a subsequence of 30 nts on either side of the SNV position was retrieved. Further, all 2042 human mature miRNA sequences from miRBase (v19) [42] were used to scan for possible target sites in wild-type and mutant (with SNV) subsequences. As a first step, TargetScan (version 6.0) [30] was used to identify pairs of SNVs and miRNAs for which the type of seed match differs between wild-type and mutant or is only present in either of them. The different seed types used by TargetScan are 7mer-1a, 7mer-m8, and 8mer-1a (in increasing strength), where ‘1a’ refers to an adenosine in the miRTS 3′ to the seed match (i.e., opposite the first nucleotide of the miRNA) and ‘-m8’ refers to a Watson-Crick-matched nucleotide in position 8. Subsequently, the interaction energy of these pairs was computed using miRanda (version 3.3a) [31]. As a seed match change is already required by the TargetScan filtration, the parameters for miRanda were set to not weigh the seed region too high (‘-scale 2’ instead of default 4) and with relaxed cutoffs (score 45, energy −5 kcal/mol), in order to capture cases where a poor seed match can be compensated. To classify an interaction as working we later apply a more conservative energy threshold of −11 kcal/mol based on our previous study [43]. For each pair of miRNA and 61mer, only the strongest binding site (lowest ) that differs between WT and SNV sequence is retained. Putative interactions are classified as created, destroyed or altered upon mutation based on miRanda predictions. The create set contains target sites that are induced by the SNV, i.e., have an interaction energy of −11 kcal/mol or lower in the SNV variant, while no interaction is predicted in the wild-type (either due to score or energy threshold). Similarly, a loss of target site would be recorded in the destroy set, if the interaction is predicted for the wild-type but not with the SNV. Finally, the alter set contains putative interactions that are predicted with a binding energy of at most −11 kcal/mol for at least one variant. For these, the energy difference observed for the binding of miRNA before and after SNV introduction was measured as their log-ratio lr = ld (/), for <0. The lr is 0 if there is no change in energy; negative if the wild type has the stronger interaction (lower energy), positive otherwise. Given the size of the data set, we focus on the (top) candidates whose absolute lr value is above the mean (μ) of absolute lr values from all pairs classified as alter. The efficiency of this threshold clearly varies with the data, but it will always retain the top candidates with highest relative energy difference. Even though it should not be seen as a fixed cut-off, we applied it to our set of known examples, where 14 out of 23 interactions exceed the value we applied here (see Table S1). Threshold values based on the distribution of MFE changes have been used in a similar way before [24]. In order to reduce false positive predictions, the miRNA target sites predicted for the wild-type (destroy or alter) were cross-checked with experimentally identified microRNA-target interaction maps. Those data, derived through Ago CLIP-Seq, was downloaded from starBase [44]. Only SNVs that are located inside stringent Ago CLIP-Seq peak clusters with a biological complexity (BC) of at least two were retained. This filter cannot be used for the interactions from the create set, as CLIP-Seq data is available for the wild-type only. Finally, the set of miRNAs was filtered for those expressed in the respiratory system (lung and trachea) according to the miRNA body map [45]. The overview of the miRNA analysis is described in Figure 2.
Figure 2

Pipeline for the analysis of SNVs' effect on miRNA target sites in more detail (dashed box from Figure 1).

The flow chart shows the different steps of prediction and filtration with the number of individual SNVs, miRNAs, and pairs of these at each stage.

Pipeline for the analysis of SNVs' effect on miRNA target sites in more detail (dashed box from Figure 1).

The flow chart shows the different steps of prediction and filtration with the number of individual SNVs, miRNAs, and pairs of these at each stage. From the PhenomiR [46] database, we retrieved information about miRNAs that have been found to be up- or down regulated in lung cancer. This set comprises 264 individual miRNA stem-loop accessions, 3 of which are ‘dead entries’. The remaining 261 stem-loops give rise to 430 mature miRNA products, which we refer to as lung cancer-associated miRNAs. In this data set, 27 miRNAs are specific to NSCLC [47] type according to miRNA body map [45]. The later data set is referred as NSCLC-associated miRNAs.

Ingenuity Pathways Analysis

Interactome networks of candidate genes were constructed using Ingenuity Pathway Analysis (IPA) software (Ingenuity® Systems, www.ingenuity.com; build version: 220217; content version: 16542223). Network generation is based on the ‘Global Molecular Network’ in IPA, which comprises an extensive, manually curated set of gene-gene relationships based on findings from the scientific literature. The genes of interest (candidates put out by our pipeline and used as input for IPA) that are also present in this global network are the so-called focus genes. Highly-interconnected focus genes are the starting points in network generation. Additional non-focus genes from the Global Molecular Network might be used as linker genes between small networks. Networks are extended until an approximate size of 35 genes, which is considered optimal for visualization and interpretation (for details see [48]). The p-value computed for each network represents the probability to find the same (or higher) number of focus genes in a randomly selected set of genes from the global network. It is computed by a right-tailed Fisher Exact Test with (non-)focus molecules on the one side and molecules (not) in the network on the other side of a 2×2 contingency table. This is transformed into a score which is the negative log of the p-value. Furthermore, IPA was used to identify the top diseases and disorders, molecular and cellular functions, and canonical pathways associated with the genes in our candidate sets (so called ‘focus genes’). The p-value for a given (disease, function or pathway) annotation describes the likelihood that the association between the input gene set and the annotation is due to random chance. This is also based on a right-tailed Fisher's exact test as specified above.

Results

Effect of SNVs on RNA secondary structure

The structural effects of 29,290 UTR SNVs were predicted using RNAsnp (v1.1). Both the Euclidean distance (dmax) and Pearson Correlation Coefficient (rmin) measures of RNAsnp (mode 1) were independently employed to predict the effect of SNVs on local RNA secondary structure. The distribution of p-values calculated for the 29,290 UTR SNVs is shown in Figure S1A. At a significance level of 0.1 (chosen from our previous study [29]), 3237 and 3062 SNVs were predicted, respectively, by dmax and rmin measures. Further, the adjustment for multiple comparisons (using Benjamini–Hochberg procedure [38]) provided 3204 and 1813 SNVs respective to dmax and rmin measures. After fusing these two lists, we got 3561 unique SNVs in 2411 genes. Further, we calculated the distance between the location of these 3561 SNVs and the predicted local region where the maximum structural change was detected (Figure S1B). It shows that the majority of the SNVs cause structural change in and around the SNV position. In addition, the length distribution of the predicted local region shows that the majority of SNVs have effect on the local region of size 50 to 100 nts, however, certain SNVs (n = 47) have effect on a global structure where the size of predicted local region exceeds 300 nts (Figure S1C). Furthermore, we checked whether these disruptive SNVs are enriched in GC or AU rich regions, as sequences with such biased nucleotide content have been shown more sensitive to structural changes caused by mutations [49]. For each SNV we computed the GC content of its flanking regions (as previously using 200 nts up- and down- stream), see Materials and Methods. This showed that both the data set SNVs and the disruptive SNVs are highly enriched in the regions with GC content ranging from 40 to 60 percent (see Figure S2), which should therefore make them less sensitive to variations. In addition, we found that there were no significant differences between GC content distributions of disruptive SNVs and the data set SNVs (see Figure S2 with Kolmogorov-Smirnov). It is known that the UTRs of mRNAs harbor evolutionarily conserved regulatory elements ([50]–[52], see also reviews [53], [54]). Thus, we cross-checked for the overlap between the disrupted local region predicted by RNAsnp and the conserved RNA secondary structures predicted using our in-house pipeline [39] (see Material and Methods sections for details). Interestingly, the local region predicted for 472 SNVs (p-value<0.1) overlap with the predicted conserved RNA secondary structures. These 472 SNVs correspond to 408 genes; out of which 111 SNVs correspond to 98 genes that are involved in cancer-associated pathways (see File S1.xlsx). Based on the p-value, the above 111 SNVs were further classified into two groups: 28 as high-confidence for which both dmax and rmin p-value<0.05 (Table 1), and the other 83 as medium-confidence (either dmax or rmin p-value<0.1) (see File S2.xlsx). We predict that the SNV-induced structural changes in the UTR regions could potentially affect the stability of the mRNA or disrupt the function of regulatory elements present in the UTRs. For example, the SNV A2304C (Table 1) present in the 3′ UTR of MAPK14 mRNA shows a significant structural change (p-value: 0.0076) in the local RNA secondary structure which is structurally conserved according to both CMfinder and RNAz predictions from our in-house pipeline [39]. This structural conservation shows that the region is under evolutionary pressure to maintain the structure which is likely to have some functional importance. The protein encoded by MAPK14 gene is a member of the MAP kinase family, which is known to be involved in many pathways related to cell division, maturation and differentiation (reviewed in [55]). Also, it has been predicted to be one of the key players in the lung cancer interactome [6]. Thus the alteration in the gene expression of MAPK14 at a post-transcriptional level due to the SNV-induced structural change could potentially affect the MAKP14-related signaling pathways.
Table 1

List of 28 high-confidence SNVs with p-value<0.05 predicted by both dmax and rmin measures of RNAsnp.

GenemRNAUTRSNVRNAsnp dmax (p-value)% overlap with conserved secondary structurea RNAsnp rmin (p-value)% overlap with conserved secondary structuredbSNP 135
GSRNM_0011951023A2638G0.003089$ 0.0213100$ rs1138092
MEF2ANM_0055873A2046U0.0032100$ 0.0238100$
PPM1ANM_1779523G2231A0.0059100# 0.0118100#
MAPK14NM_1390123A2304C0.0076100#;96$ 0.022491#;91$
PHC2NM_1980403A3730C0.0105-0.013775#
BECN1NM_0037663U1970C0.0120100# 0.0487100#
NFKBIENM_0045563G1659C0.014484$ 0.0473100$
MAPK1NM_0027453U2360G0.0148100# 0.0262100# rs13058
DHCR24NM_0147623A4192C0.0159100# 0.0494100#
ADAMTS1NM_0069883U4320G0.016694# 0.021583#
SRFNM_0031313C3504U0.0173100# 0.0074100# rs3734681
CASP2NM_0329823U2139C0.018956# 0.023062#
LFNGNM_0010401673C1838G0.021564# 0.0333-rs4721752
SH3PXD2ANM_0146313C8560U0.0223100$ 0.0073-
KITLGNM_0008993U1057G0.022650$;84# 0.008090#
PRKAB1NM_0062533U1875C0.0237100$ 0.0462100$
TFGNM_0011954795G309C0.0256-0.039856#
FTH1NM_0020323U819G0.026293# 0.0259100#
BCL2L2NM_0011998393C2469A0.027578# 0.0294100# rs3210043
CDKN1CNM_0000763G1334C0.029087# 0.025387#
TIA1NM_0221733U4082A0.0316100# 0.0363100#
NFKBIENM_0045563U1644G0.033398# 0.0347100#
DAPK3NM_0013483G1662U0.033454# 0.021694# rs3745982
NCOA1NM_0037433C4893G0.0342100# 0.0176100# rs17737058
PCBP4NM_0011741003C1790G0.041359# 0.0187-
SH3PXD2ANM_0146313U8562A0.0451100$ 0.0096100$
ID2NM_0021665C143G0.046487# 0.047483#
GPX3NM_0020843U1552G0.047473# 0.0427100#

a The conserved RNA secondary structure predicted by CMfinder and RNAz programs (through our in-house pipeline [39]) are highlighted with the symbols # and $, respectively.

a The conserved RNA secondary structure predicted by CMfinder and RNAz programs (through our in-house pipeline [39]) are highlighted with the symbols # and $, respectively. As another example, the gene GPX3 is responsible for the coding of plasma glutathione peroxidase, an antioxidant enzyme that contains selenocysteine in its active site and catalyzes the reduction of hydrogen peroxide. The amino acid selenocysteine is encoded by the UGA codon, which normally functions as a stop codon. In the GPX3 mRNA, the alternate recognition of a UGA codon as a selenocysteine codon is mediated by the cis-acting regulatory element, selenocysteine insertion sequence (SECIS), present in the 3′ UTR and other trans-acting co-factors [56]. The SNV U1552G (Table 1) located in the 3′ UTR of GPX3 mRNA was predicted to cause significant structural effect (p-value: 0.0474) in the local region which contains the SECIS regulatory element. Figure 3 shows the base pair probabilities corresponding to the local region (NM_002084:1544 to 1692) of wild-type and mutant mRNA. It can be seen that the wild-type has higher base pair probabilities to form the stable stem-loop structure of SECIS (highlighted with a circle in Figure 3), whereas in the mutant form it is disrupted due to the SNV, which is located outside the SECIS region. Previous study has shown that the characteristic stem-loop structure of SECIS is essential for the efficiency of UGA recoding in vivo and in vitro [56]. Based on this, we speculate that the SNV U1552G induced structural change in the SECIS element may affect the efficiency of UGA recoding.
Figure 3

Results of SNV U1552G predicted to cause significant local secondary structure changes in 3′ UTR of GPX3 mRNA.

The dot plot from RNAsnp web server [67] shows the base pair probabilities corresponds to the local region predicted with significant difference (d p-value: 0.0474) between wild-type and mutant. The upper triangle represents the base pair probabilities for the wild-type (green) and the lower triangle for the mutant (red). On the sides, the minimum free energy (MFE) structure of the wild-type and mutants are displayed in planar graphic representation. The SECIS region is highlighted in blue circle and the SNV position is indicated with arrow mark.

Results of SNV U1552G predicted to cause significant local secondary structure changes in 3′ UTR of GPX3 mRNA.

The dot plot from RNAsnp web server [67] shows the base pair probabilities corresponds to the local region predicted with significant difference (d p-value: 0.0474) between wild-type and mutant. The upper triangle represents the base pair probabilities for the wild-type (green) and the lower triangle for the mutant (red). On the sides, the minimum free energy (MFE) structure of the wild-type and mutants are displayed in planar graphic representation. The SECIS region is highlighted in blue circle and the SNV position is indicated with arrow mark. Further, considering both the set of high-confidence and medium-confidence SNVs, we found that the genes (n = 15) listed in Table 2 harbor more than one disruptive SNV in the predicted conserved structural region of mRNA. For example, the gene ID2 encodes for DNA-binding protein inhibitor ID-2, which is a critical factor for cell proliferation and differentiation in normal vertebrate development. Overexpression of the ID-2 protein is frequently observed in various human tumors, including NSCLC [57]. In the mRNA sequence of ID2 gene, two SNVs located in the 5′ UTR region were independently predicted to cause significant local structural change in the conserved region. Previous studies have shown that SNP or mutation induced structural changes in the 5′ UTRs can lead to uncontrolled translation or overexpression of the respective proteins [58], [59]. We predict that the two SNVs that cause significant change in the structurally conserved region could affect the translation efficiency of ID2 mRNA.
Table 2

List of genes which have more than one disruptive SNV (combined high-confidence and medium-confidence candidates) in the UTRs.

GenemRNAUTRSNVa RNAsnp (p-value)b % overlap of predicted local region with conserved RNA secondary structurec dbSNP 135
SH3PXD2ANM_0146313C8560U0.0223100$
SH3PXD2ANM_0146313U8562A†0.0451100$
MAPK1NM_0027453G1633A0.0815100# rs41282607
MAPK1NM_0027453U2360G†0.0148100# rs13058
ACOX1NM_0040353U4708G0.0650100#
ACOX1NM_0040353A6386U0.075059#
ADAMTS1NM_0069883U4320G0.016694#
ADAMTS1NM_0069883U3449C0.062686$
CDC42NM_0010398025C159A0.0665100$
CDC42NM_0010398025G152A0.0901100$
ID2NM_0021665C143G†0.046487#
ID2NM_0021665C129G0.06987#
NFKBIENM_0045563G1659C†0.014484#
NFKBIENM_0045563U1644G†0.033398#
RASSF1NM_1707143A1907U0.062964#
RASSF1NM_1707143G1904A0.065964#
RXRBNM_0219763U2066G0.026859# rs2744537
RXRBNM_0219763U2053A0.045273# rs5030979
PCBP4NM_0011741003U1862G0.040193#
PCBP4NM_0011741003C1790G0.041359#
MTA2NM_0047395A227G0.046273#
BECN1NM_0037663U1970C†0.012100#
CTSBNM_1477823A2561G0.0569100$
HTTNM_0021113C9948G†0.0987100# rs362305
HTTNM_0021113U9947C0.0225*100#
BECN1NM_0037663G2053A0.0329*98# rs11552193
LMNB2NM_0327373A3713G0.0554*59#
LMNB2NM_0327373U3662C0.0638*57#
CTSBNM_1477823A2581G0.0925*50$
MTA2NM_0047395C267G0.1035*53#

a SNVs that were predicted by both dmax and rmin measures are highlighted with †.

b The p-value corresponding to the rmin measure is highlighted with *.

c The conserved RNA secondary structure predicted by CMfinder and RNAz program (through our in-house pipeline [39]) are highlighted with the symbols # and $, respectively.

a SNVs that were predicted by both dmax and rmin measures are highlighted with †. b The p-value corresponding to the rmin measure is highlighted with *. c The conserved RNA secondary structure predicted by CMfinder and RNAz program (through our in-house pipeline [39]) are highlighted with the symbols # and $, respectively. Out of the 472 disruptive SNVs obtained from the secondary structure analysis (before intersecting with the cancer-associated genes), 199 overlap with SNPs from dbSNP (build 135). Of these 199 SNVs, 17 are in linkage dis-equilibrium (LD) with other SNPs that are located proximal (+/−200 nts) to the SNV position. These 17 pairs were tested with RNAsnp to check whether the SNP in LD with (disruptive) SNV is a structure-stabilizing haplotype [60]. Of these 17 pairs, five were predicted to cause no significant structural changes, which could be possible structure-stabilizing haplotypes; whereas the other 12 pairs have shown significant structural changes (see File S3.xls).

Effect of SNVs on microRNA target sites

Screening all human mature miRNAs against all identified SNVs with flanking sequence yields 2×59,810,180 possible combinations. The initial TargetScan step is a conservative filter and reduces the set of SNV-miRNA pairs to 0.2% of this. We then apply miRanda as a second target prediction method, followed by a set of filters. The distribution of lr values in the alter set is shown in Figure S3, only cases with relative changes larger than the described cut off are considered (see Materials and Methods). Figure 2 shows the different steps with individual counts of putative interaction sites at each stage. This gives us 490 SNVs in 447 genes predicted to affect 707 interactions with 344 miRNAs (see File S4.xlsx). After intersection with known cancer-associated genes (final step in the pipeline, Figure 1), we find 124 SNVs and 148 miRNAs to be involved in 186 interactions that differ with the mutation. These SNVs that induce putative miRTS changes can be further classified into those enhancing interaction with the mutant (80) or wild-type (52) variant. Table 3 lists all genes that contain more than one miRTS predicted to be changed between wild-type and SNV. This includes examples where the same SNV changes the target site for different mature miRNAs from the same family, but also examples where different SNVs within the gene cause a gain or loss of a miRTS. Similarly, all miRNAs with more than two changed target sites are presented in Table 4. It lists members of the miR-29 family which have previously been reported to act as tumor suppressors as well as oncogenes (see [61] for a review).
Table 3

List of genes which have more than one miRNA target site change (create, alter, destroy) in their UTRs.

GenemRNAUTRSNVdbSNP 135miRNA(s)
ACTR3NM_0057213G2078Ars6642 miR-662
ACTR3NM_0057215U232G miR-18a-3p
AMD1NM_0010330595A262G miR-1236-3p
AMD1NM_0016345A263G miR-1236-3p
ARL5BNM_1788153A947U miR-409-3p
ARL5BNM_1788153G2609Urs12098599 miR-362-3p
ARL5BNM_1788153U946C miR-124-5p, miR-599
BCL2L13NM_0012707313U1850Grs725768 miR-361-3p
BCL2L13NM_0012707313U2269Ars74932682 miR-519b-3p
BCL7ANM_0010248083G1801C miR-650
BCL7ANM_0010248083U1804C miR-650
CALM1NM_0068883A1872Grs63576962 miR-211-5p
CALM1NM_0068883C2472G miR-29a-3p, miR-29b-3p, miR-29c-3p
CBX1NM_0011272283A1839Urs6847 miR-548b-5p, miR-548c-5p, miR-548d-5p
CBX5NM_0121173A2592G miR-887
CBX5NM_0121173C11158U miR-654-5p
CCND2NM_0017593C2086U miR-21-3p
CCND2NM_0017593G5917U miR-139-5p
CKLFNM_0163265U72G miR-29a-3p, miR-29b-3p, miR-29c-3p
EIF4EBP2NM_0040963C5092G miR-15b-5p, miR-16-5p, miR-195-5p, miR-424-5p, miR-503-5p, miR-646
IGFBP5NM_0005993G2493C miR-675-5p
IGFBP5NM_0005993G3898Urs13403592 miR-29a-3p, miR-29b-3p, miR-29c-3p
JAK1NM_0022273U5007A miR-106a-5p, miR-106b-5p, miR-17-5p, miR-20a-5p, miR-20b-5p, miR-519a-3p, miR-519b-3p, miR-519c-3p, miR-519d, miR-520g, miR-520h, miR-526b-3p, miR-93-5p
KLF10NM_0056553C2615Ars6935 miR-337-3p, miR-614
KRASNM_0333603U1049Grs712 miR-151a-5p, miR-877-5p
KREMEN1NM_0010395703A5345C miR-519a-3p, miR-520b, miR-520c-3p, miR-636
LMNB2NM_0327373C2928G miR-423-5p
LMNB2NM_0327373C2929A miR-423-5p
NCK2NM_0035813G1974U miR-137, miR-488-3p
NDUFB7NM_0041465G39Crs45628939 miR-192-5p, miR-215
NR1D2NM_0051263A2782G miR-338-5p
NR1D2NM_0051263U2791C miR-504
P4HA1NM_0010179625C174G miR-412
P4HA1NM_0011425955C174G miR-412
PANX1NM_0153683G2082Ars1046805 miR-10a-5p, miR-10b-5p
PBX1NM_0012049613A4035G miR-188-5p
PBX1NM_0012049613G2877U miR-187-5p, miR-222-3p
PBX1NM_0012049633A3249Grs12723035 miR-326, miR-330-5p
PDPK1NM_0026133U2034G miR-504, miR-518a-3p, miR-518b, miR-518d-3p, miR-518f-3p
PRKAB2NM_0053993U4199G miR-150-5p, miR-532-3p
PTPN1NM_0028273G2125Ars118042879 miR-141-5p, miR-942
RAP1ANM_0028843C1100Ars6573 miR-135a-3p, miR-196a-5p, miR-196b-5p
SDC4NM_0029993U1874G miR-361-3p
SDC4NM_0029993U1878G miR-548d-3p
SESN2NM_0314593A2864Crs10494394 miR-182-5p, miR-92a-1-5p, miR-96-5p
SLC39A6NM_0123193C3543U miR-144-3p
SLC39A6NM_012319G3545U miR-101-3p, miR-139-5p, miR-144-3p
SMAD5NM_0059033C2825A miR-124-3p, miR-500a-3p, miR-501-3p, miR-502-3p, miR-506-3p
SMNDC1NM_0058713G1228Ars1050755 miR-329, miR-362-3p
SUZ12NM_0153553C2473G miR-30a-3p, miR-30d-3p, miR-30e-3p, miR-452-5p
SUZ12NM_0153553G2475U miR-30a-3p, miR-30d-3p, miR-30e-3p, miR-595
SUZ12NM_0153553U2474A miR-30a-3p, miR-30e-3p
TNFRSF19NM_0012044583A2148U miR-766-3p
TNFRSF19NM_0012044583A2452Urs79570196 miR-26a-5p
TOMM20NM_0147653A3198C miR-149-3p
TOMM20NM_0147653U3378G miR-129-1-3p, miR-129-2-3p
TOMM20NM_0147653U3379G miR-150-5p, miR-532-3p

The miRNA IDs are boldface if the interaction is predominant in the wild-type (destroy or alter with ) and italics if the interaction is specific to the mutant (create or alter with ); the hsa- prefix is omitted for brevity.

Table 4

List of miRNAs with more than two targets in the filtered data set.

GenemRNAUTRSNVmiRNAΔGWT ΔGSNV dbSNP 135
CALM1NM_0068883C2472Ghsa-miR-29a-3pN/A−16.70
CKLFNM_0163265U72Ghsa-miR-29a-3pN/A−20.70
IGFBP5NM_0005993G3898Uhsa-miR-29a-3pN/A−13.83rs13403592
CALM1NM_0068883C2472Ghsa-miR-29b-3pN/A−18.12
CKLFNM_0163265U72Ghsa-miR-29b-3pN/A−18.76
IGFBP5NM_0005993G3898Uhsa-miR-29b-3pN/A−11.93rs13403592
CALM1NM_0068883C2472Ghsa-miR-29c-3pN/A−15.13
CKLFNM_0163265U72Ghsa-miR-29c-3pN/A−18.83
IGFBP5NM_0005993G3898Uhsa-miR-29c-3pN/A−12.71rs13403592
SUZ12NM_0153553C2473Ghsa-miR-30a-3p−12.66−8.97
SUZ12NM_0153553G2475Uhsa-miR-30a-3p−12.66−6.32
SUZ12NM_0153553U2474Ahsa-miR-30a-3p−12.66−6.86
SUZ12NM_0153553C2473Ghsa-miR-30e-3p−12.53−8.02
SUZ12NM_0153553G2475Uhsa-miR-30e-3p−12.53−6.19
SUZ12NM_0153553U2474Ahsa-miR-30e-3p−12.53−6.73
BCL2L13NM_0012707313U1850Ghsa-miR-361-3pN/A−17.85rs725768
SDC4NM_0029993U1874Ghsa-miR-361-3p−15.88−20.18
SOX4NM_0031073G4753Ahsa-miR-361-3p−20.76−14.48rs11556729
BCL2L13NM_0012707313U2269Ahsa-miR-519b-3pN/A−11.07rs74932682
JAK1NM_0022273U5007Ahsa-miR-519b-3p−16.69−12.31
OSMRNM_0039993C4534Uhsa-miR-519b-3pN/A−13.92
FAM46CNM_0177093A1459Ghsa-miR-614−17.89−22.28rs2066411
KLF10NM_0056553C2615Ahsa-miR-614−22.30−17.54rs6935
RHEBNM_0056143A1229Ghsa-miR-614N/A−15.19
The miRNA IDs are boldface if the interaction is predominant in the wild-type (destroy or alter with ) and italics if the interaction is specific to the mutant (create or alter with ); the hsa- prefix is omitted for brevity. Of the 148 miRNAs (responsible for 186 putative interactions) in our final candidate set, 89 are lung cancer-associated miRNAs (in 117 interactions) (indicated in File S4.xlsx). Table 5 lists all 14 putative target sites in our final candidate set that include NSCLC-associated miRNAs. Notably, the list includes four miRNAs with more than one predicted target changed. For miR-184 one target site is created while another one is weakened upon introduction of the mutation. Moreover, miR-30a, d, and e are predicted to target the 3′ UTR of SUZ12 gene. However, the predicted interactions are likely to be functional in the wild-type and lost in the mutant due to SNV-induced changes at the seed region. SUZ12 has previously been shown to be directly targeted by miR-200b and inhibition of this miRNA increases the formation of cancer stem cells (CSCs) [62], which contribute to tumor aggressiveness. The loss of miR-30 regulation by (one of) the three adjacent SNVs in the seed of the target site could have a similar effect in NSCLC.
Table 5

List of target predictions of NCSLC-associated miRNAs derived from the microRNA body map [45].

GenemRNAUTRSNVmiRNAΔGWT ΔGSNV dbSNP 135
DHCR24NM_0147623A4192Chsa-miR-7-5pN/A−11.85
EIF4EBP2NM_0040963C5092Ghsa-miR-15b-5p−16.10−10.65
EIF4EBP2NM_0040963C5092Ghsa-miR-16-5p−18.20−13.97
EIF4EBP2NM_0040963C5092Ghsa-miR-195-5p−17.63−12.23
KIF3BNM_0047983G5433Ahsa-miR-184−21.40−14.40rs41289846
MED16NM_0054815A129Uhsa-miR-184N/A−20.65
SUZ12NM_0153553C2473Ghsa-miR-30a-3p−12.66−8.97
SUZ12NM_0153553C2473Ghsa-miR-30d-3p−11.68−8.65
SUZ12NM_0153553C2473Ghsa-miR-30e-3p−12.53−8.02
SUZ12NM_0153553G2475Uhsa-miR-30a-3p−12.66−6.32
SUZ12NM_0153553G2475Uhsa-miR-30d-3p−11.68−5.57
SUZ12NM_0153553G2475Uhsa-miR-30e-3p−12.53−6.19
SUZ12NM_0153553U2474Ahsa-miR-30a-3p−12.66−6.86
SUZ12NM_0153553U2474Ahsa-miR-30e-3p−12.53−6.73
Furthermore, for 48 SNVs the predicted miRTSs were found to be located inside the local region where a significant secondary structural change was predicted by RNAsnp. Of these, 15 SNVs were located in the cancer-associated genes (see Table 6). Based on the previous studies [63], [64], we speculate that the SNV-induced miRTS change along with the secondary structural changes in and around the miRTS can potentially affect the binding of predicted miRNA.
Table 6

List of predicted miRNA target site changes that overlap with RNAsnp predictions.

GenemRNAUTRSNVmiRNAΔGWT ΔGSNV RNAsnp (p-value)a
DHCR24NM_0147623A4192Chsa-miR-7-5pN/A−11.850.0159
EIF4EBP2NM_0040963C5092Ghsa-miR-15b-5p−16.10−10.650.0341
EIF4EBP2NM_0040963C5092Ghsa-miR-16-5p−18.20−13.970.0341
EIF4EBP2NM_0040963C5092Ghsa-miR-195-5p−17.63−12.230.0341
EIF4EBP2NM_0040963C5092Ghsa-miR-424-5p−16.13−12.180.0341
EIF4EBP2NM_0040963C5092Ghsa-miR-503-5p−17.19−12.420.0341
EIF4EBP2NM_0040963C5092Ghsa-miR-646−13.75−9.560.0341
ATP6V1C2NM_1445833G2321Chsa-miR-615-3pN/A−18.910.0483
NOP10NM_0186483G432Ahsa-miR-342-3p−22.81−18.510.0518
RAD21NM_0062653G3118Uhsa-miR-361-5p−11.01N/A0.0696
PANX1NM_0153683G2082Ahsa-miR-10a-5p−14.54−7.670.0704
PANX1NM_0153683G2082Ahsa-miR-10b-5p−13.85N/A0.0704
CCND2NM_0017593G5917Uhsa-miR-139-5p−13.21−7.520.0726
SESN2NM_0314593A2864Chsa-miR-92a-1-5p−14.56−21.550.0818
SESN2NM_0314593A2864Chsa-miR-96-5p−14.76−9.520.0818
SESN2NM_0314593A2864Chsa-miR-182-5p−19.86−15.360.0818
PPA1NM_0211295G92Uhsa-miR-378a-5p−15.90−20.850.083
SLC39A6NM_0123193G3545Uhsa-miR-144-3p−11.40−8.090.0832
SLC39A6NM_0123193G3545Uhsa-miR-101-3p−20.37−14.030.0832
SLC39A6NM_0123193G3545Uhsa-miR-139-5p−20.37−14.030.0832
PPA2NM_0069033A983Uhsa-miR-139-3pN/A−18.240.0841
TNFRSF19NM_0012044583A2148Uhsa-miR-766-3pN/A−14.710.0874
SLC39A6NM_0123193C3543Uhsa-miR-144-3p−11.40−6.700.0924
CRYL1NM_0159743U1350Ahsa-miR-330-5pN/A−19.740.0487*
HIPK2NM_0011132393U7743Ghsa-miR-181a-2-3pN/A−14.080.0581*

a SNV predicted by rmin measure is highlighted with *.

a SNV predicted by rmin measure is highlighted with *.

Functional analysis of genes predicted with SNVs' effect on UTRs

To illustrate how the candidate SNVs obtained from our pipeline can be further analyzed for potential functionality and co-operability, we investigated the resulting sets from miRNA and RNAsnp analyses individually as well as their combination, each before and after intersection with cancer-related genes. More precisely, the following six gene sets have been tested by Ingenuity Pathways Analysis (see Table 7): 490 SNVs corresponding to 447 genes from miRNA analysis (miRNA in all genes), 124 SNVs corresponding to 104 genes that overlap with our cancer gene set (miRNA in cancer-related genes), 472 SNVs associated with 408 genes from RNAsnp analysis (RNAsnp in all genes), 111 SNVs corresponding to 89 genes that intersect with our cancer gene set (RNAsnp in cancer-related genes), a unique gene list obtained after combination of 447 genes from miRNA analysis and 408 genes from RNAsnp analysis (miRNA and RNAsnp overlap in all genes), and a unique cancer gene list obtained from 104 genes from miRNA and 89 genes form RNAsnp analysis respectively (miRNA and RNAsnp overlap in cancer-related genes).
Table 7

Summary of pathway analysis results using Ingenuity pathway analysis software.

miRNA in all genesmiRNA in cancer related genesRNAsnp in all genesRNAsnp in cancer related genesmiRNA and RNAsnp overlap in all genesmiRNA and RNAsnp overlap in cancer related genes
Top 3 networks
1. Cell Death and Survival, Cardiovascular System Development and Function, Organismal Development (44)1. Cell Death and Survival, Cellular Growth and Proliferation, DNA Replication, Recombination, and Repair (34)1. Skeletal and Muscular System Development and Function, Cell Death and Survival, Cardiovascular System Development and Function (40)1. Cellular Growth and Proliferation, Cell Death and Survival, Cellular Development (42)1. Cell Signaling, Nucleic Acid Metabolism, Small Molecule Biochemistry (38)1. Gene Expression, Cell Death and Survival, Cancer (43)
2. Cell Death and Survival, Cell-To-Cell Signaling and Interaction, Nervous System Development and Function (29)2. Cellular Growth and Proliferation, Cell Death and Survival, Cardiovascular System Development and Function (16)2. Cell Death and Survival, Cellular Function and maintenance, Cell Morphology (40)2. Cell Death and Survival, Cellular Assembly and Organization, Cell Cycle (37)2. Cellular Growth and Proliferation, Cell Morphology, Cellular Assembly and Organization (34)2. Cellular Growth and Proliferation, Cell Death and Survival, Cellular Assembly and Organization (41)
3. Hematological Disease, Immunological Disease, Cellular Development (27)3. Cardiovascular Disease, Gene Expression, Organismal Development (14)3. Cellular Assembly and Organization, Post-Translational Modification, Cellular Movement (32)3. Gene Expression, Cellular Growth and Proliferation, Embryonic Development (24)3. Cellular Movement, Cell Death and Survival, Cardiovascular System Development and Function (34)3. Cell Death and Survival, Dermatological Diseases and Conditions, Cellular Development (34)
Top 3 diseases and disorders
1. Infectious Disease (1.75E-4–4.85E-2)1. Cancer (4.36E-6–4.90E-2)1. Cancer (2.03E-4–4.41E-2)1. Cancer (9.42E-8–1.27E-2)1. Infectious Disease (1.04E-5–4.31E-2)1. Cancer (8.23E-10–1.22E-2)
2. Cancer (1.42E-3–4.71E-2)2. Hematological Disease (1.16E-4–4.05E-2)2. Endocrine System Disorders (4.95E-4–2.38E-2)2. Hematological Disease (1.46E-5–1.27E-2)2. Cancer (1.93E-4–4.31E-2)2. Hematological Disease (1.48E-8–1.22E-2)
3. Hepatic System Disease (1.42E-3–2.38E-2)3. Endocrine System Disorders (1.36E-4–3.34E-2)3. Reproductive System Disease (4.95E-4–4.08E-2)3. Gastrointestinal Disease (1.08E-4–1.27E-2)3. Hepatic System Disease (3.37E-4–4.31E-2)3. Infectious Disease (4.22E-6–8.87E-3)
Top 3 molecular and cellular functions
1. Protein Synthesis (4.99E-6–2.09E-3)1. Cellular Growth and Proliferation (4.03E-9–4.65E-2)1. Cellular Growth and Proliferation (6.76E-6–4.41E-2)1. Cellular Growth and Proliferation (1.18E-17–1.27E-2)1. Cellular Growth and Proliferation (3.14E-7–4.31E-2)1. Cellular Growth and Proliferation (5.68E-25–1.22E-2)
2. RNA Post-Transcriptional Modification (5.32E-4–2.38E-2)2. Cell Death and Survival (7.07E-8–4.68E-2)2. Cell Death and Survival (7.24E-6–4.41E-2)2. Cell Death and Survival (4.08E-17–1.27E-2)2. Cell Death and Survival (6.1E-7–4.31E-2)2. Cell Death and Survival (1.09E-19–1.22E-2)
3. RNA Damage and Repair (5.67E-4–5.67E-4)3. Cellular Development (1.02E-6–4.00E-2)3. Cellular Assembly and Organization (4.83E-5–4.41E-2)3. Cellular Development (5.79E-14–1.27E-2)3. Protein Synthesis (2.39E-5–3.73E-2)3. Cellular Development (1.89E-18–1.22E-2)
Top 3 canonical pathways
1. EIF2 Signaling (1.61E-5)1. Molecular Mechanisms of Cancer (4.14E-7)1. LPS-stimulated MAPK Signaling (3.37E-5)1. LPS-stimulated MAPK Signaling (8.4E-10)1. iNOS Signaling (4.32E-4)1. Molecular Mechanisms of Cancer (3.11E-12)
2. mTOR Signaling (4.31E-4)2. Insulin Receptor Signaling (2.47E-6)2. iNOS Signaling (3.56E-4)2. Molecular Mechanisms of Cancer (2.65E-8)2. EIF2 Signaling (4.52E-4)2. Glucocorticoid Receptor Signaling (2.88E-9)
3. Insulin Receptor Signaling (9.85E-4)3. IGF-1 Signaling (4.17E-6)3. Germ Cell-Sertoli Cell Junction Signaling (6.75E-4)3. IL-6 Signaling (5.27E-8)3. mTOR Signaling (8.85E-4)3. LPS-stimulated MAPK Signaling (1.9E-8)

The numbers at the end of each cell represent the p-values, but for the top networks it is the p-score (−log10 p-value).

The numbers at the end of each cell represent the p-values, but for the top networks it is the p-score (−log10 p-value). Based on significant p-values obtained from each of our analyses, we have listed the top 3 networks, diseases and disorders, molecular and cellular functions, and canonical pathways in Table 7. Our results indicate that the top networks identified from our six gene set analyses are highly enriched with cell death and survival as well as cellular growth and proliferation (see Table 7). Figure 4 shows the network of those two cases which were predicted using the genes from the combination of miRNA and RNAsnp analyses, whereas the networks from other gene sets are shown in Figure S4. The networks shown in Figure 4 contain several genes from Table 2 and 3 which were predicted to have more than one disruptive SNV, and also genes from Table 6 for which predictions from miRNA and RNAsnp analyses overlap. We predict that these genes have a higher chance of being disrupted by the SNVs in UTRs, which might cause a change in protein translation and thereby disrupt the interaction of this protein with others.
Figure 4

Network Analysis of genes predicted to have SNVs' effect on UTRs.

The networks represent the interaction between genes that were predicted to have SNVs' effect on UTRs from miRNA and RNAsnp analysis (see Table 7, column 5). The gene nodes were colored to differentiate the known (orange) and unknown (green) cancer-associated genes, and the color outside the node indicates whether the gene comes from miRNA (yellow) or RNAsnp (blue) or both.

Network Analysis of genes predicted to have SNVs' effect on UTRs.

The networks represent the interaction between genes that were predicted to have SNVs' effect on UTRs from miRNA and RNAsnp analysis (see Table 7, column 5). The gene nodes were colored to differentiate the known (orange) and unknown (green) cancer-associated genes, and the color outside the node indicates whether the gene comes from miRNA (yellow) or RNAsnp (blue) or both. The top diseases and disorders associated with the gene sets predominantly include cancer. In addition, top three canonical pathways related to the gene sets are molecular mechanisms of cancer, LPS-stimulated MAPK signaling, IL-6 signaling, iNOS signaling, EIF2 signaling and mTOR signaling. It should be noted that the enrichment for cancer and related molecular functions is found in our miRNA and RNAsnp gene sets even before intersecting with the list of cancer-associated genes (see Table 7).

Discussion

With the help of whole-genome sequencing technology, the complete genome of a cancer cell can be sequenced effectively to identify somatic single nucleotide variants (SNVs) [65]. To date, more than 50 different cancer types and/or sub types have been sequenced [2]. The lung cancer genome was first sequenced in 2010 [66], which reports that the somatic variants were present in both coding and non-coding (UTR and other non-coding RNAs) transcribed regions, which constitute 0.6% and 0.8% respectively of the total somatic mutations identified (22,910). In a recent study, transcriptome-wide sequencing of non-small cell lung cancer (NSCLC) type with wild-type and mutant KRAS revealed 73,717 SNVs that consisted of both germ-line and somatic variants. Of these SNVs, 29,290 were located in the UTRs of mRNAs that correspond to 6462 genes. We have developed a comprehensive computational pipeline to predict the effects of SNVs located in the UTRs that can potentially affect the post-transcriptional regulation, through SNV-induced secondary structure changes in the UTRs or changes in miRTSs within UTRs. Using this pipeline, we predicted 472 out of 29,290 UTR SNVs to have significant effect on the local RNA secondary structure of UTRs (corresponding to 408 genes). Additionally, 490 out of 29,290 UTR SNVs were predicted to cause changes in a miRNA target site within the UTRs of 447 genes. Of these 490 SNVs, 124 were present in 104 genes that were previous known to be cancer-associated. For these 104 genes, 148 miRNAs were predicted to bind either in the wild-type or mutant. We found 89 out of these 148 miRNAs overlap with lung cancer-associated miRNAs, while eight miRNAs are associated specifically to NSCLC. Taken together, all these disruptive SNVs, which were predicted to affect secondary structure or miRNA target sites, were present in 803 different genes; out of which 188 (23.4%) were previously known to be cancer-associated. Notably, this ratio is significantly higher (p-value 0.032, one-sided Fisher's exact test) than the ratio (20.8%) of known cancer-associated genes (n = 1347) in our initial data set of 6462 genes. Similar enrichment (p-value 0.040, one-sided Fisher's exact test) was observed when comparing the ratio of disruptive SNVs in cancer-associated genes compared to all other genes versus the ratio of the data set SNVs in cancer-associated genes compared to all other genes. However, while comparing the ratios separately on the results obtained from RNA secondary structure and miRTS analysis, we did not find any significant difference (data not shown). Further, the IPA networks analysis (that addresses the biological relationships between genes/gene products) shows that the physical interaction of genes predicted with SNV effect might be involved in cell death and survival as well as cellular growth and proliferation. However, further analysis of these networks with respect to the topology (e.g., edge counts, neighborhood connectivity, in and out degree) is required. The functional analysis using IPA shows that the genes from our pipeline were involved in canonical pathways such as molecular mechanisms of cancer, IL-6 signaling, LPS-stimulated MAPK signaling pathways, iNOS Signaling, EIF2 signaling and mTOR signaling pathways. Given the large data set of 29,290 SNVs and the generally high false positive rate of established miRNA target prediction methods, we chose stringent filters in the miRNA analysis. The requirement of a TargetScan seed change, used to reduce the initial set of pairs, is present in 60% of our benchmark data (Table S1). The intersection of candidates in the alter and destroy sets with Ago CLIP-Seq data is another conservative filter. Due to incompleteness of the data, this filters out some true interactions (as can be seen from Table S1), but gives higher confidence in the remaining candidates (i.e., not all known miRNA interactions actually overlap with the Ago CLIP-Seq peak clusters; but if there is a cluster with BC≥2 there likely is a real interaction). Interactions from the create set are not issued to that filter, so the resulting candidates might be biased towards that. Individual filters can be left out or chosen to be more or less conservative depending on the data set at hand. Also, it should be noted that this data set contains both germ-line and somatic variants. In the previous study, no normal cells were sequenced in parallel to the lung adenocarcinomas to remove germ-line variants. Based on the overlap of these SNVs with the dbSNP (build 135) we could estimate that 40% of the 29,290 SNVs are germ-line variants. However, we did not remove those germ-line variants in this analysis because these germ-line variants are also important and may have a role for differences in cancer predisposition and drug response between individuals. In summary, we hypothesize that the SNVs predicted to cause significant changes in the secondary structure of UTRs or miRNA target sites within UTRs can have the potential to alter the expression of genes linked to cancer-associated pathways, and thereby contribute to the development of cancer. Although we do not provide experimental validation to support these predictions, we have highlighted the significant causative SNVs, which will be helpful for further detailed investigation. As for example, the SNV U1552G that affects the structure of the cis-acting regulatory element, selenocysteine insertion sequence (SECIS) (Figure 4), which is associated with the translational control of GPX3 mRNA. The computational pipeline presented here can be adopted for UTR SNV data from other cancer genome and transcriptome studies. It is worth considering that the SNVs outside the protein-coding regions can have functional impacts causing altered expression of a gene. This may help identification of new cancer driver mutations. Future directions include protein binding site predictions on both structured and unstructured parts of the UTRs. Merging with the growing amount of experimental data concerning RNA binding proteins, e.g., CLIP-seq, more general types of data than those related to miRNA targets should provide complementary information. For example, additional ranking of predicted binding site structure disruption. Further, if such data is extracted from disease tissue it should provide yet another complementary layer of data pointing to specific candidates. Overview of RNAsnp predictions. (PDF) Click here for additional data file. Distribution of GC-content in regions around (disruptive) SNVs. (PDF) Click here for additional data file. Histogram of values in miRNA analysis set. (PDF) Click here for additional data file. Top three IPA networks for the six different gene sets as described in . (PDF) Click here for additional data file. All candidates from RNAsnp analysis. Excel table listing 472 SNVs in 408 genes. (XLSX) Click here for additional data file. Medium-confidence candidates from RNAsnp analysis. Subset of File S1, lists 83 SNVs in cancer-associated genes with either dmax or rmin p-value<0.1 (but not both <0.05). (XLSX) Click here for additional data file. RNAsnp predicted effect of SNPs in LD with disruptive SNVs. (XLSX) Click here for additional data file. Candidates from miRNA analysis. Excel table listing 490 SNVs in 447 genes predicted to affect target sites of 344 miRNAs, with indication of cancer-association genes and lung cancer-associated miRNAs. (XLSX) Click here for additional data file. Filtration steps in the miRNA pipeline tested on known examples. (PDF) Click here for additional data file.
  62 in total

1.  Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets.

Authors:  Benjamin P Lewis; Christopher B Burge; David P Bartel
Journal:  Cell       Date:  2005-01-14       Impact factor: 41.582

2.  CMfinder--a covariance model based RNA motif finding algorithm.

Authors:  Zizhen Yao; Zasha Weinberg; Walter L Ruzzo
Journal:  Bioinformatics       Date:  2005-12-15       Impact factor: 6.937

Review 3.  Cis-acting determinants of asymmetric, cytoplasmic RNA transport.

Authors:  Ashwini Jambhekar; Joseph L Derisi
Journal:  RNA       Date:  2007-05       Impact factor: 4.942

4.  SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap.

Authors:  Andrew D Johnson; Robert E Handsaker; Sara L Pulit; Marcia M Nizzari; Christopher J O'Donnell; Paul I W de Bakker
Journal:  Bioinformatics       Date:  2008-10-30       Impact factor: 6.937

5.  Genome-wide dissection of microRNA functions and cotargeting networks using gene set signatures.

Authors:  John S Tsang; Margaret S Ebert; Alexander van Oudenaarden
Journal:  Mol Cell       Date:  2010-04-09       Impact factor: 17.970

6.  International network of cancer genome projects.

Authors:  Thomas J Hudson; Warwick Anderson; Axel Artez; Anna D Barker; Cindy Bell; Rosa R Bernabé; M K Bhan; Fabien Calvo; Iiro Eerola; Daniela S Gerhard; Alan Guttmacher; Mark Guyer; Fiona M Hemsley; Jennifer L Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusada; David P Lane; Frank Laplace; Lu Youyong; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T S Rao; Jacques Remacle; Alan J Schafer; Tatsuhiro Shibata; Michael R Stratton; Joseph G Vockley; Koichi Watanabe; Huanming Yang; Matthew M F Yuen; Bartha M Knoppers; Martin Bobrow; Anne Cambon-Thomsen; Lynn G Dressler; Stephanie O M Dyke; Yann Joly; Kazuto Kato; Karen L Kennedy; Pilar Nicolás; Michael J Parker; Emmanuelle Rial-Sebbag; Carlos M Romeo-Casabona; Kenna M Shaw; Susan Wallace; Georgia L Wiesner; Nikolajs Zeps; Peter Lichter; Andrew V Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L Ferguson; Peter Geary; D Neil Hayes; Thomas J Hudson; Amber L Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A Piris; Rajiv Sarin; Aldo Scarpa; Tatsuhiro Shibata; Marc van de Vijver; P Andrew Futreal; Hiroyuki Aburatani; Mónica Bayés; David D L Botwell; Peter J Campbell; Xavier Estivill; Daniela S Gerhard; Sean M Grimmond; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D McPherson; Hidewaki Nakagawa; Zemin Ning; Xose S Puente; Yijun Ruan; Tatsuhiro Shibata; Michael R Stratton; Hendrik G Stunnenberg; Harold Swerdlow; Victor E Velculescu; Richard K Wilson; Hong H Xue; Liu Yang; Paul T Spellman; Gary D Bader; Paul C Boutros; Peter J Campbell; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J Hubbard; Tao Jiang; Steven M Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B F Francis Ouellette; John V Pearson; Xose S Puente; Victor Quesada; Benjamin J Raphael; Chris Sander; Tatsuhiro Shibata; Terence P Speed; Lincoln D Stein; Joshua M Stuart; Jon W Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A Wheeler; Honglong Wu; Shancen Zhao; Guangyu Zhou; Lincoln D Stein; Roderic Guigó; Tim J Hubbard; Yann Joly; Steven M Jones; Arek Kasprzyk; Mark Lathrop; Nuria López-Bigas; B F Francis Ouellette; Paul T Spellman; Jon W Teague; Gilles Thomas; Alfonso Valencia; Teruhiko Yoshida; Karen L Kennedy; Myles Axton; Stephanie O M Dyke; P Andrew Futreal; Daniela S Gerhard; Chris Gunter; Mark Guyer; Thomas J Hudson; John D McPherson; Linda J Miller; Brad Ozenberger; Kenna M Shaw; Arek Kasprzyk; Lincoln D Stein; Junjun Zhang; Syed A Haider; Jianxin Wang; Christina K Yung; Anthony Cros; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Jack Hsu; Martin Bobrow; Don R C Chalmers; Karl W Hasel; Yann Joly; Terry S H Kaan; Karen L Kennedy; Bartha M Knoppers; William W Lowrance; Tohru Masui; Pilar Nicolás; Emmanuelle Rial-Sebbag; Laura Lyman Rodriguez; Catherine Vergely; Teruhiko Yoshida; Sean M Grimmond; Andrew V Biankin; David D L Bowtell; Nicole Cloonan; Anna deFazio; James R Eshleman; Dariush Etemadmoghadam; Brooke B Gardiner; Brooke A Gardiner; James G Kench; Aldo Scarpa; Robert L Sutherland; Margaret A Tempero; Nicola J Waddell; Peter J Wilson; John D McPherson; Steve Gallinger; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Debabrata Mukhopadhyay; Lynda Chin; Ronald A DePinho; Sarah Thayer; Lakshmi Muthuswamy; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Vanessa Ballin; Youyong Lu; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Huanming Yang; Mark Lathrop; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevard; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Anne Cambon-Thomsen; Juris Viksna; Fredrik Ponten; Konstantin Skryabin; Michael R Stratton; P Andrew Futreal; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Sancha Martin; Jorge S Reis-Filho; Andrea L Richardson; Christos Sotiriou; Hendrik G Stunnenberg; Giles Thoms; Marc van de Vijver; Laura van't Veer; Fabien Calvo; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Christian Chabannon; Ivo Gut; Jocelyne D Masson-Jacquemier; Mark Lathrop; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Gilles Thomas; Jörg Tost; Isabelle Treilleux; Fabien Calvo; Paulette Bioulac-Sage; Bruno Clément; Thomas Decaens; Françoise Degos; Dominique Franco; Ivo Gut; Marta Gut; Simon Heath; Mark Lathrop; Didier Samuel; Gilles Thomas; Jessica Zucman-Rossi; Peter Lichter; Roland Eils; Benedikt Brors; Jan O Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D Taylor; Christof von Kalle; Partha P Majumder; Rajiv Sarin; T S Rao; M K Bhan; Aldo Scarpa; Paolo Pederzoli; Rita A Lawlor; Massimo Delledonne; Alberto Bardelli; Andrew V Biankin; Sean M Grimmond; Thomas Gress; David Klimstra; Giuseppe Zamboni; Tatsuhiro Shibata; Yusuke Nakamura; Hidewaki Nakagawa; Jun Kusada; Tatsuhiko Tsunoda; Satoru Miyano; Hiroyuki Aburatani; Kazuto Kato; Akihiro Fujimoto; Teruhiko Yoshida; Elias Campo; Carlos López-Otín; Xavier Estivill; Roderic Guigó; Silvia de Sanjosé; Miguel A Piris; Emili Montserrat; Marcos González-Díaz; Xose S Puente; Pedro Jares; Alfonso Valencia; Heinz Himmelbauer; Heinz Himmelbaue; Victor Quesada; Silvia Bea; Michael R Stratton; P Andrew Futreal; Peter J Campbell; Anne Vincent-Salomon; Andrea L Richardson; Jorge S Reis-Filho; Marc van de Vijver; Gilles Thomas; Jocelyne D Masson-Jacquemier; Samuel Aparicio; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Hendrik G Stunnenberg; Laura van't Veer; Douglas F Easton; Paul T Spellman; Sancha Martin; Anna D Barker; Lynda Chin; Francis S Collins; Carolyn C Compton; Martin L Ferguson; Daniela S Gerhard; Gad Getz; Chris Gunter; Alan Guttmacher; Mark Guyer; D Neil Hayes; Eric S Lander; Brad Ozenberger; Robert Penny; Jane Peterson; Chris Sander; Kenna M Shaw; Terence P Speed; Paul T Spellman; Joseph G Vockley; David A Wheeler; Richard K Wilson; Thomas J Hudson; Lynda Chin; Bartha M Knoppers; Eric S Lander; Peter Lichter; Lincoln D Stein; Michael R Stratton; Warwick Anderson; Anna D Barker; Cindy Bell; Martin Bobrow; Wylie Burke; Francis S Collins; Carolyn C Compton; Ronald A DePinho; Douglas F Easton; P Andrew Futreal; Daniela S Gerhard; Anthony R Green; Mark Guyer; Stanley R Hamilton; Tim J Hubbard; Olli P Kallioniemi; Karen L Kennedy; Timothy J Ley; Edison T Liu; Youyong Lu; Partha Majumder; Marco Marra; Brad Ozenberger; Jane Peterson; Alan J Schafer; Paul T Spellman; Hendrik G Stunnenberg; Brandon J Wainwright; Richard K Wilson; Huanming Yang
Journal:  Nature       Date:  2010-04-15       Impact factor: 49.962

Review 7.  Cancer genome-sequencing study design.

Authors:  Jill C Mwenifumbo; Marco A Marra
Journal:  Nat Rev Genet       Date:  2013-05       Impact factor: 53.242

8.  Multiple independent analyses reveal only transcription factors as an enriched functional class associated with microRNAs.

Authors:  Larry Croft; Damian Szklarczyk; Lars Juhl Jensen; Jan Gorodkin
Journal:  BMC Syst Biol       Date:  2012-07-23

9.  starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data.

Authors:  Jian-Hua Yang; Jun-Hao Li; Peng Shao; Hui Zhou; Yue-Qin Chen; Liang-Hu Qu
Journal:  Nucleic Acids Res       Date:  2010-10-30       Impact factor: 16.971

10.  Novel structural determinants in human SECIS elements modulate the translational recoding of UGA as selenocysteine.

Authors:  Lynda Latrèche; Olivier Jean-Jean; Donna M Driscoll; Laurent Chavatte
Journal:  Nucleic Acids Res       Date:  2009-08-03       Impact factor: 16.971

View more
  16 in total

Review 1.  Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment.

Authors:  Khader Shameer; Lokesh P Tripathi; Krishna R Kalari; Joel T Dudley; Ramanathan Sowdhamini
Journal:  Brief Bioinform       Date:  2015-10-22       Impact factor: 11.622

2.  miR-211 promotes non-small cell lung cancer proliferation by targeting SRCIN1.

Authors:  Leiguang Ye; Hui Wang; Baogang Liu
Journal:  Tumour Biol       Date:  2015-08-16

Review 3.  The roles of RNA processing in translating genotype to phenotype.

Authors:  Kassie S Manning; Thomas A Cooper
Journal:  Nat Rev Mol Cell Biol       Date:  2016-11-16       Impact factor: 94.444

Review 4.  The effects of structure on pre-mRNA processing and stability.

Authors:  Rachel Soemedi; Kamil J Cygan; Christy L Rhine; David T Glidden; Allison J Taggart; Chien-Ling Lin; Alger M Fredericks; William G Fairbrother
Journal:  Methods       Date:  2017-06-06       Impact factor: 3.608

Review 5.  Intronic RNA: Ad'junk' mediator of post-transcriptional gene regulation.

Authors:  Christopher R Neil; William G Fairbrother
Journal:  Biochim Biophys Acta Gene Regul Mech       Date:  2019-11-01       Impact factor: 4.490

Review 6.  Competing endogenous RNA interplay in cancer: mechanism, methodology, and perspectives.

Authors:  Dong-Liang Cheng; Yuan-Yuan Xiang; Li-juan Ji; Xiao-Jie Lu
Journal:  Tumour Biol       Date:  2015-01-22

7.  Differential miRNA expressions in peripheral blood mononuclear cells for diagnosis of lung cancer.

Authors:  Jie Ma; Yanli Lin; Min Zhan; Dean L Mann; Sanford A Stass; Feng Jiang
Journal:  Lab Invest       Date:  2015-07-06       Impact factor: 5.662

8.  Functional classification of 15 million SNPs detected from diverse chicken populations.

Authors:  Almas A Gheyas; Clarissa Boschiero; Lel Eory; Hannah Ralph; Richard Kuo; John A Woolliams; David W Burt
Journal:  DNA Res       Date:  2015-04-29       Impact factor: 4.458

Review 9.  MicroRNA-mediated regulation of KRAS in cancer.

Authors:  Minlee Kim; Frank J Slack
Journal:  J Hematol Oncol       Date:  2014-11-30       Impact factor: 17.388

Review 10.  The potential of microRNAs in personalized medicine against cancers.

Authors:  Anne Saumet; Anthony Mathelier; Charles-Henri Lecellier
Journal:  Biomed Res Int       Date:  2014-08-28       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.