Literature DB >> 24995327

Computational evidence of NAGNAG alternative splicing in human large intergenic noncoding RNA.

Xiaoyong Sun1, Simon M Lin2, Xiaoyan Yan3.   

Abstract

NAGNAG alternative splicing plays an essential role in biological processes and represents a highly adaptable system for posttranslational regulation of gene function. NAGNAG alternative splicing impacts a myriad of biological processes. Previous studies of NAGNAG largely focused on messenger RNA. To the best of our knowledge, this is the first study testing the hypothesis that NAGNAG alternative splicing is also operative in large intergenic noncoding RNA (lincRNA). The RNA-seq data sets from recent deep sequencing studies were queried to test our hypothesis. NAGNAG alternative splicing of human lincRNA was identified while querying two independent RNA-seq data sets. Within these datasets, 31 NAGNAG alternative splicing sites were identified in lincRNA. Notably, most exons of lincRNA containing NAGNAG acceptors were longer than those from protein-coding genes. Furthermore, presence of CAG coding appeared to participate in the splice site selection. Finally, expression of the isoforms of NAGNAG lincRNA exhibited tissue specificity. Together, this study improves our understanding of the NAGNAG alternative splicing in lincRNA.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24995327      PMCID: PMC4068082          DOI: 10.1155/2014/736798

Source DB:  PubMed          Journal:  Biomed Res Int            Impact factor:   3.411


1. Introduction

The NAGNAG alternative splicing mechanism is a process which facilitates alternative protein expression from a single gene. Analysis of deep RNA-sequencing data by Bradley et al. (2012) confirmed that NAGNAG is highly regulated [1]. NAGNAG alternative splicing specifically targets inclusion or exclusion of three nucleotides at 3′ splice sites (Figure 1), thus effecting a change in one or two amino acids encoded in the final protein [2-9]. Such amino acid substitutions have been shown to affect protein function and interfere with signaling [10], affect cellular localization [11], and impact on DNA and protein binding [12-14] in both plants and mammals. A role for NAGNAG alternative splicing was shown in human Stargardt disease [15] and has been implicated in other disease processes including cancer [16].
Figure 1

NAGNAG alternative splicing can result in two isoforms. The NAGNAG acceptors at the 3′-end can be either at site 1 or site 2, are three nucleotides apart, and exhibit the “NAGNAG” motif signature.

Large intergenic noncoding RNAs (lincRNAs) have traditionally been defined as long noncoding transcripts greater than 200 nucleotides in length. Overlapping isoforms of lincRNA have been reported previously and may include protein-coding genes [17]. Recently, while exploring the dynamic profiles of NAGNAG acceptors in Arabidopsis, we identified two isoforms originating from the same NAGNAG acceptors but located in noncoding RNA [18]. To date, previous studies have assumed NAGNAG acceptors function through the classical mRNA paradigm based on observation of altered coding for one or two amino acids in the protein-coding gene. Based on this observation of NAGNAG acceptors in Arabidopsis, we proposed an expanded paradigm and hypothesized that NAGNAG alternative splicing mechanism also exists in lincRNA. Bioinformatics has become a powerful tool for the study of alternative splicing and its functional consequence. To date, bioinformatic analyses have produced evidence of alternative splicing in approximately 80% of human genes [19]. Bioinformatic approaches have been invaluable for exploring comparative genomics across species and such studies have produced important insights into regulatory mechanisms governing splicing and its role in evolution and adaptation. Single base-pair resolution offered by deep RNA sequencing motivated us to find further direct evidence of NAGNAG alternative splicing in lincRNA. To accomplish this goal we applied computational approaches to two public datasets of deeply-sequenced human tissue genomic data whose content included previously annotated lincRNA. By aligning the two RNA-seq data sets and systematically screening, identifying, and quantifying the NAGNAG alternative splicing of lincRNA, 31 NAGNAG alternative splicing events in lincRNA were defined. Importantly, tissue-specific patterns of expression for NAGNAG isoforms in lincRNA were observed.

2. Methods

2.1. Data

RNA-seq data sets were downloaded from NCBI SRA (accession number for data sets 1 and 2: E-MTAB-513 and GSE30554). These RNA-seq data were generated by sequencing 8 individual human tissues and mixture of 16 tissues (Illumina Body Map) using the Illumina HiSeq 2000 (Illumina, Inc.) platform. Each sample was deeply sequenced with more than 200 million reads and annotated for lincRNA. We only kept the high-quality reads using FastX quality filter with the following criteria: minimum of 20 Phred score over at least 80% of the sequence read.

2.2. Alignment, Screening, and Quantification

Annotations of human lincRNA were obtained from Human lincRNA Catalog hosted at Broad Institute [20]. All RNA-seq datasets were aligned to lincRNA with tophat [21] using the “-max-multihits 1”, which only permits unique mapping. The anchor length of the software was set at 8 nt and the mismatch number in these regions at 0 nt to avoid alignment bias. After the data were aligned, sequence postprocessing tool (SAMtools) was used to store, sort, and index the binary SAM data (bam files) with respect to sequence alignment (http://samtools.sourceforge.net) [22]. To identify lincRNA containing NAGNAG alternative splicing sites, we screened the lincRNA sequences using the classical expression of the “NAGNAG” motif. Alignment of RNA-seq reads to the NAGNAG splicing junctions was used to confirm and validate the existence of the splice sites. We required at least four junction reads with the same 5′ splice sites, stipulating that two needed to match the first NAGNAG splice site (site 1) while the other two were required to match the second NAGNAG splice site (site 2) [23, 24]. The sequences for splice sites and the 30 bp exonic and intronic flanking sequences were extracted based on hg19 genome sequence with Bioconductor package Biostrings (R package version 2.22.0). Sequence logos were drawn by WebLogo with default parameters as described previously [25]. Two flanking sequences of the NAGNAG acceptors, including 30 bp from intron and 30 bp from exon, were extracted and screened for the potential patterns. The ratio of isoform expression at two alternative splice sites (site 1 and site 2) was calculated as log(read counts at side 1⁄read counts at side 2). NAGNAG acceptors were grouped into four categories based on this ratio and the strand information. If the expression of isoform 1 was more than that of isoform 2, ratio > 0; otherwise, ratio < 0. To quantify RNA expression levels, all RNA-seq counts were normalized using reads per million (RPM). The expression level of NAGNAG isoforms in lincRNA was calculated by read counts through Bioconductor package Rsamtools (R package version 1.6.3) and IRanges (R package version 1.12.6). Duplicate reads were kept for quantification purpose. NAGNAG motifs were only designated as NAGNAG acceptors if two splice sites exhibited more than 2 reads in at least two samples. To avoid ambiguity, we discarded those NAGNAG acceptors located in the overlapping area between lincRNAs and annotated genes.

2.3. Quantification of Tissue-Specific NAGNAG Acceptors

To analyze the relationship between the ratio of two NAGNAG splice sites and the tissues, we used Bioconductor package limma through the linear model: where Y represents the log ratio of two NAGNAG splice sites from the same NAGNAG acceptor, with NAGNAG acceptor i, tissue j, and sample k; α represents the main effect of ith NAGNAG acceptor; β represents the main effect of jth tissue; ε represents the measurement error. The NAGNAG acceptors were selected using false discovery rate (FDR)-adjusted P values < 0.05.

3. Results

Two novel observations were documented. First, mapping of unique reads to the potential NAGNAG alternative splicing sites in human lincRNA demonstrated existence of NAGNAG alternative splicing in lincRNA (Table 1). Of the 1320 lincRNAs containing the NAGNAG motif, presence of NAGNAG acceptors was confirmed with RNA-seq data in 30 lincRNAs. These 31 NAGNAG acceptors originate from 30 transcripts. Interestingly, linc-POLR3G-10 exhibited two NAGNAG acceptors located in two distinct transcripts: TCONS_00010012 and TCONS_00010010. Presence of two NAGNAG acceptors was identified in the upstream region of the fourth and fifth exons of this 5-exon gene. In addition, 8 NAGNAG acceptors were identified within the overlapping regions between lincRNA and protein-coding RNA but were not further considered in this study (see Supplementary Data 1 in Supplementary Material available online at http://dx.doi.org/10.1155/2014/736798).
Table 1

NAGNAG acceptors in lincRNA confirmed by RNA-seq.

Transcript IDlinc namechrSite 1Site 1 existenceSite 2Site 2 existenceStrandNeighbouring gene
TCONS_00000929linc-CMPK1-3chr147645537Data 1, 247645540Data 1, 2+CMPK1
TCONS_00001552linc-CTBS-1chr185084564Data 185084567Data 1CTBS
TCONS_00002502linc-CRP-1chr1159746542Data 1159746545Data 1CRP
TCONS_00002232linc-IARS2-3chr1219414541Data 1, 2219414544Data 1+IARS2
TCONS_00018502linc-GDF10-1chr1048515547Data 148515550Data 1GDF10
TCONS_00021357linc-BEST3-1chr1270124473Data 170124476Data 2BEST3
TCONS_00020623linc-TMEM132C-14chr12126580786Data 1, 2126580789Data 1+TMEM132C
TCONS_00023051linc-DIO3-8chr14101363930Data 2101363933Data 2+DIO3
TCONS_00023721linc-ANP32A-1chr1569753046Data 269753049Data 1, 2ANP32A
TCONS_00023791linc-FAM174B-1chr1593325371Data 1, 293325374Data 1FAM174B
TCONS_00023799linc-RGMA-7chr1595753867Data 195753870Data 1RGMA
TCONS_00024399linc-CHD9-6chr1651806287Data 151806290Data 1+CHD9
TCONS_00025631linc-NR1D1-1chr1738277663Data 138277666Data 1, 2NR1D1
TCONS_00025146linc-VEZF1-1chr1756066627Data 1, 256066630Data 1, 2VEZF1
TCONS_00026560linc-NETO1-1chr1871351479Data 1, 271351482Data 1, 2NETO1
TCONS_00027051linc-ZNF227-1chr1944700207Data 144700210Data 1+ZNF227
TCONS_00004960linc-ITGA4-2chr2181940923Data 1181940926Data 1+ITGA4
TCONS_00003507linc-GPR55-1chr2231856751Data 1, 2231856754Data 2GPR55
TCONS_00029585linc-RASD2-1chr2235850418Data 135850421Data 1+RASD2
TCONS_00005471linc-TMEM14E-2chr3153103176Data 1153103179Data 1TMEM14E
TCONS_00007527linc-SPATA18-1chr452912845Data 1, 252912848Data 1, 2+SPATA18
TCONS_00009387linc-OSMR-1chr538792356Data 138792359Data 1+OSMR
TCONS_00010010linc-POLR3G-10chr587581558Data 1, 287581561Data 1, 2+POLR3G
TCONS_00010012linc-POLR3G-10chr587583253Data 1, 287583256Data 1+POLR3G
TCONS_00009724linc-LYSMD3-2chr590610061Data 1, 290610064Data 1, 2LYSMD3
TCONS_00010581linc-MGAT1-2chr5180257403Data 1180257406Data 1MGAT1
TCONS_00012396linc-PSMG4-1chr63257557Data 13257560Data 1+PSMG4
TCONS_00011322linc-FAM135A-1chr671104930Data 171104933Data 1+FAM135A
TCONS_00012862linc-FAM20C-2chr7153409Data 1153412Data 1+FAM20C
TCONS_00014103linc-SEPT7-1chr735756638Data 135756641Data 1, 2+SEPT7
TCONS_00014833linc-UTP23-3chr8112757671Data 1, 2112757674Data 2+UTP23
Most exons in lincRNA containing NAGNAG acceptors exceeded protein-coding genes in length (Wilcoxon rank sum test, P value < 2.2e − 16). The average exon length of protein-coding genes ranged between 306 ± 702 bp and the average neighbouring intron length ranged between 6092 ± 19983 bp (Supplementary Figure S1), compared to the average exon and intron length of lincRNA which ranged between 349 ± 630 bp and 8476 ± 19751 bp, respectively. Most tandem acceptors of lincRNA occurred at the furthest exon, that is, second exon occurring in the lincRNA (mean: 2.52; sd: 0.71) whereas those found in protein-coding genes were found centrally located among all of the exons occurring in the gene (mean: 10.7; sd: 8.8). The most prevalent triplet found among the lincRNA sequences was CAG for both splice sites, with GAG present at lowest frequency (Supplementary Table S1). CAGCAG and CAGAAG combinations occurred at highest frequency. Positive correlation with the expression level was found when CAG was encoded relative to splice site selection. Specifically, a predilection for the first splice site was noted when CAG was encoded at the first NAG site (ratio > 0, Figure 2). Alternatively, when CAG was located at the second NAG position or was absent from the splice site altogether, the second NAG was favoured for splicing (ratio < 0, Figure 2).
Figure 2

Sequence logos for 30 bp flanking sequences for 3′ splice sites. The logos are divided into four groups based on the chromosome strand and ratio of read counts of site 1 to site 2.

The second novel observation was demonstration of tissue-specific properties by 6 NAGNAG acceptors in lincRNA (FDR adjusted P value < 0.05). Figure 3 shows that 6 of 31 NAGNAG acceptors exhibited statistically significant differences in expression levels across diverse tissues. Specifically, as seen in Figure 3, the first NAG splice site is specifically targeted by the NAGNAG acceptor: chr5:87583253-87583256_+ from TCONS_00010012. Presence of these splice sites was associated with a clear expression pattern in several tissues including lymph node, lung, and kidney, and this signature was remarkably consistent. Moreover, a similar pattern for the alternative splice sites was noted and the second NAG splice site was specifically targeted by NAGNAG acceptors: chr15:95753867-95753870_-. This distinctive expression pattern was clearly evident in ovary. Twenty-five of NAGNAG acceptors were notably absent or exhibited no difference in expression pattern across most tissues.
Figure 3

Heat map for the ratio of the NAGNAG isoforms at the two alternative splice sites (site 1 and site 2). Row represents 31 NAGNAG acceptors while column represents various tissues. Ratio > 0: site 2 is preferred. Ratio < 0: site 1 is preferred.

4. Discussion

Splice sites are pivotal factors in the splicing process [26]. NAGNAG alternative splicing was identified in the past decade and is characterized by inclusion or exclusion of three nucleotides at 3′ splice sites, resulting in substitutions in one or two amino acids in the protein products. Previous studies have shown that this type of alternative splicing is highly regulated and related to proteome evolution [1]. Functionally, NAGNAG alternative splicing in mRNA results in various isoforms which generate alternative proteins following translation. To the best of our knowledge, the present study provides the first evidence that NAGNAG alternative splicing can be observed not only in mRNA but also in lincRNA. Although alternative splicing of lincRNA was reported previously [20], the report of NAGNAG alternative splicing is novel. Following analysis of two RNA-seq data sets including annotations for lincRNA, we identified 31 NAGNAG acceptors in lincRNA. These 31 NAGNAG acceptors originated from 30 transcripts. Interestingly, a role for “CAG” sequence was suggested in splice site selection with CAG being the most prevalent triplet found among the lincRNA sequences for both splice sites. GAG was present at lowest frequency and CAGCAG and CAGAAG combinations occurred at highest frequency. A predilection for the first splice site was noted when CAG was encoded at the first NAG site. The second NAG was favoured for splicing when CAG was located at the second NAG position or was absent altogether. This finding is consistent with the previous reports about mRNA [27]. Traditionally, lincRNA has been defined as stretches of DNA transcripts exceeding 200 base pairs in length which do not encode putative functional protein products [28]. lincRNA has been posited to play a role in splicing processes [29] and has been reported to contain predominately two exons [30]. In the current study, most exons from lincRNA containing NAGNAG acceptors exceeded protein-coding genes in length. Most tandem acceptors of lincRNA identified in the present study occurred at the furthest exon, that is, the second exon occurring in the lincRNA. By contrast those found in protein-coding genes have generally been found centrally located among all of the exons occurring in the gene. The mechanism of this NAGNAG alternative splicing is not completely understood. Hiller and colleagues [3] suggested that these NAGNAG acceptors are not random noise because some fraction of NAGNAG acceptors is tissue-specific, although this theory was not universally shared by others [6, 8]. However, Bradley et al. provided solid evidence in support of tissue specificity based on RNA-seq analysis of 16 human and 8 mouse tissues wherein they demonstrated that at least 25% of NAGNAG acceptors in mRNA were regulated in a tissue-specific manner [1]. This percentage exceeded earlier estimates for tissue specificity [27]. Analysis of our selected datasets revealed low levels of consistent tissue-specific patterns relative to NAGNAG acceptors in lincRNA. Among 19% of NAGNAG acceptors that exhibited distinct differences in expression levels of certain tissues, targeting of specific splicing pattern among two NAGNAG acceptors was noted. There are some limitations of this computational study. First, use of annotation data was limited to the Human lincRNA Catalog at Broad Institute [20], although other annotations of human lincRNA are also available [30]. More information about lincRNA will help to identify more NAGNAG alternative splicing. Second, biological significance and potential disease impact of NAGNAG alternative splicing was only projected computationally, and awaits confirmation through further proteomic studies. For example, results of gene ontology analysis by application for genes targeted by NAGNAG acceptors in lincRNA indicated that these genes were all functionally engaged in transcription regulation (ANP32A, CHD9, NR1D1, POLR3G, VEZF1, ZNF227) and signalling (CRP, CTBS, FAM174B, FAM20C, GDF10, ITGA4, NETO1, RGMA, TMEM132C, OSMR). Further, analysis for potential disease association of the neighbouring genes revealed that these genes represented candidate genes associated with risk for many important diseases, including hypertension, obesity, and cancer, among others (see Supplementary Table S2 for a complete list). Importantly, bioinformatics analysis has proved to be an invaluable tool in the investigation of the role of alternative splicing from numerous perspectives including microarray analysis, alternative splicing prediction utilizing comparative genomic approaches, identification and depiction of isoform and splicing patterns, definition of regulation of alternative splicing, delineation of functional impact, and its role in defining evolutionary and adaptive processes, among other investigations [19]. To delineate alternative splicing in lincRNA, further investigations are essential in unraveling their functional and regulatory roles through application of bioinformatic, genetic, and proteomic approaches. The evolutionary aspect of lincRNA NAGNAG alternative splicing across different species can also be studied in the future. Supplementary Figure S1: The exon and intron length of protein-coding gene versus lincRNA. Supplementary Table S1: Occurrence of NAGNAG acceptors in human long non-coding RNA. Supplementary Table S2: Disease association of the lincRNA-neighboring genes. Supplementary Data 1: lincRNA that overlap with protein-coding RNA.
  30 in total

1.  Alternative splicing of prosystemin pre-mRNA produces two isoforms that are active as signals in the wound response pathway.

Authors:  L Li; G A Howe
Journal:  Plant Mol Biol       Date:  2001-07       Impact factor: 4.076

2.  WebLogo: a sequence logo generator.

Authors:  Gavin E Crooks; Gary Hon; John-Marc Chandonia; Steven E Brenner
Journal:  Genome Res       Date:  2004-06       Impact factor: 9.043

3.  Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity.

Authors:  Michael Hiller; Klaus Huse; Karol Szafranski; Niels Jahn; Jochen Hampe; Stefan Schreiber; Rolf Backofen; Matthias Platzer
Journal:  Nat Genet       Date:  2004-10-31       Impact factor: 38.330

Review 4.  Bioinformatics analysis of alternative splicing.

Authors:  Christopher Lee; Qi Wang
Journal:  Brief Bioinform       Date:  2005-03       Impact factor: 11.622

5.  The 2588G-->C mutation in the ABCR gene is a mild frequent founder mutation in the Western European population and allows the classification of ABCR mutations in patients with Stargardt disease.

Authors:  A Maugeri; M A van Driel; D J van de Pol; B J Klevering; F J van Haren; N Tijmes; A A Bergen; K Rohrschneider; A Blankenagel; A J Pinckers; N Dahl; H G Brunner; A F Deutman; C B Hoyng; F P Cremers
Journal:  Am J Hum Genet       Date:  1999-04       Impact factor: 11.025

6.  Evolutionary conservation of minor U12-type spliceosome between plants and humans.

Authors:  Zdravko J Lorkovic; Reinhard Lehner; Christina Forstner; Andrea Barta
Journal:  RNA       Date:  2005-07       Impact factor: 4.942

7.  An alternative splicing event in the Pax-3 paired domain identifies the linker region as a key determinant of paired domain DNA-binding activity.

Authors:  K J Vogan; D A Underhill; P Gros
Journal:  Mol Cell Biol       Date:  1996-12       Impact factor: 4.272

8.  Genome-wide study of NAGNAG alternative splicing in Arabidopsis.

Authors:  Yanjing Shi; Guangli Sha; Xiaoyong Sun
Journal:  Planta       Date:  2013-10-06       Impact factor: 4.116

9.  Two alternatively spliced forms of the human insulin-like growth factor I receptor have distinct biological activities and internalization kinetics.

Authors:  G Condorelli; R Bueno; R J Smith
Journal:  J Biol Chem       Date:  1994-03-18       Impact factor: 5.157

10.  Identification of alternatively spliced mRNA variants related to cancers by genome-wide ESTs alignment.

Authors:  Lijian Hui; Xin Zhang; Xin Wu; Zhixin Lin; Qingkang Wang; Yixue Li; Gengxi Hu
Journal:  Oncogene       Date:  2004-04-15       Impact factor: 9.867

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.