Literature DB >> 28053119

Short intron-derived ncRNAs.

Florent Hubé1,2, Damien Ulveling1,2, Alain Sureau1,2, Sabrina Forveille1,2, Claire Francastel1,2.   

Abstract

Introns represent almost half of the human genome, although they are eliminated from transcripts through RNA splicing. Yet, different classes of non-canonical miRNAs have been proposed to originate directly from intron splicing. Here, we considered the alternative splicing of introns as an interesting source of miRNAs, compatible with a developmental switch. We report computational prediction of new Short Intron-Derived ncRNAs (SID), defined as precursors of smaller ncRNAs like miRNAs and snoRNAs produced directly by splicing, and tested their dependence on each key factor in canonical or alternative miRNAs biogenesis (Drosha, DGCR8, DBR1, snRNP70, U2AF65, PRP8, Dicer, Ago2). We found that about half of predicted SID rely on debranching of the excised intron-lariat by the enzyme DBR1, as proposed for mirtrons. However, we identified new classes of SID for which miRNAs biogenesis may rely on intermingling between canonical and alternative pathways. We validated selected SID as putative miRNAs precursors and identified new endogenous miRNAs produced by non-canonical pathways, including one hosted in the first intron of SRA (Steroid Receptor RNA activator). Consistent with increased SRA intron retention during myogenic differentiation, release of SRA intron and its associated mature miRNA decreased in cells from healthy subjects but not from myotonic dystrophy patients with splicing defects.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28053119      PMCID: PMC5416886          DOI: 10.1093/nar/gkw1341

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The discovery of non-coding RNAs (ncRNA) and the variety of molecular processes in which they have been implicated suggest that they increase and diversify the amount of regulatory molecules available in the cell. In contrast to housekeeping or infrastructural ncRNAs, which are normally constitutively expressed and required for normal function and viability of the cell, regulatory ncRNAs are expressed in response to external stimuli or at particular stages of development and cell differentiation, and can affect the expression of other genes at the level of transcription or translation (1). Among those, microRNAs (miRNA) are a class of naturally occurring small ncRNAs, about 20–25 nucleotides (nt) in length, which have been identified in almost all eukaryotic cells. MiRNAs are post-transcriptional regulators that bind to complementary sequences on target mRNA, usually resulting in translational repression or target degradation and thus, gene silencing (2). The human genome may encode over 1000 miRNAs, targeting ∼60% of gene products in mammals. Therefore, because they affect gene regulation and are often deregulated in human diseases, their systematic identification has been the focus of many experimental and computational analyses [for reviews see (3,4)]. Canonical miRNAs are generated in a two-step processing pathway, mediated by two major enzymatic complexes containing the RNAse III-family of endonucleases Drosha and Dicer. Drosha, together with DGCR8, is part of the microprocessor multiprotein complex that mediates nuclear processing of the primary miRNA into stem–loop precursors of ∼60–70 nt (pre-miRNA). Exportin-5 (XPO5) mediates the nuclear export of correctly processed miRNA precursors. In the cytoplasm, the pre-miRNA is cleaved by Dicer into the mature 20–25 nt miRNA, which is then incorporated as single-stranded RNA into a ribonucleoprotein complex containing Argonaute 2 (Ago2) protein, known as the RNA-induced silencing complex (RISC). This RISC complex directs the miRNA to its target mRNA leading to its translational repression or its degradation [for a review see (5)]. It has also been postulated that regulatory RNA molecules could originate from the introns of protein-coding genes as functional by-products (6–8). Several recent studies indeed uncovered an atypical pathway to generate miRNA precursors in a way that bypasses the Drosha/DGCR8 complex (9–16). Instead, the pre-miRNA-like hairpins are produced by the action of the splicing machinery followed by lariat-debranching by the enzyme DBR1. The 5΄- and/or the 3΄-tails around the hairpin are then trimmed by the RNA exosome (12). The mirtron pathway merges with the canonical miRNA pathway at hairpin export by XPO5, and subsequent processing of hairpins by Dicer. These unusual miRNA precursors are called mirtrons owing to their embedding into introns of coding and non-protein coding genes (9,14,16). Only a few mirtrons have been described to date although they have been shown to exist from drosophila to humans (9–16). More recently, two other unconventional pathways have been described; the simtron pathway involves a small nuclear ribonucleoprotein (snRNP), which is part of the spliceosome complex snRNP70 that directly recruits Drosha on the stem–loop hairpin formed by the pre-mRNA, independently of DGCR8 (17,18). After cleavage by Drosha, pre-miRNAs are exported to the cytoplasm. The agotron pathway is more intricate given that the spliced introns are processed by Ago2 directly in the nucleus. Hence, miRNAs generated by the agotron pathway are processed independently of Drosha and Dicer (19). Interestingly, several miRNAs have been described to be processed from snoRNAs (20) or tRNAs (21) [for a review see (22)] and the description of non-canonical pathways has accumulated over the years (15,17,19,23–26). As we just discussed above, many new unconventional miRNAs, and most if not all small nucleolar RNAs (snoRNAs), are produced from introns through splicing mechanisms. Although introns, which represent about half of the genome in mammals, have long been considered as ‘junk’ and ‘dark matter’ since they are quickly degraded within minutes following their excision, an increasing number of studies have shown the importance of introns in producing small ncRNAs. The possibility that there are more to be discovered, expressed at lower levels or in more specialized cellular contexts, calls for the exploitation of genome sequencing information to accelerate their discovery and ease their structural characterization. We propose herein the bioinformatic identification of human candidates as intron-derived small ncRNAs precursors through data mining of publically available information provided by high-throughput sequencing projects. We used the term SID as an acronym for Short Intron-Derived ncRNAs precursors of smaller ncRNAs like miRNAs and snoRNAs regardless of their biogenesis pathway but produced directly from splicing. Hence, this term may group precursors of snoRNAs that are already known to originate from splicing of introns (27,28), and non-canonical precursors of miRNAs like agotrons and mirtrons (15,19) or other classes of splicing-dependent miRNA precursors yet to be identified. We focused on small introns (shorter than 200 nt) since they are more prone to alternative splicing (29). We provide a list of 56 new human SID, extracted from databases of alternative splicing events and matching entries in databases of human small RNAs. We further experimentally validated that these SID candidates were indeed detectable, in a panel of six cell lines and primary cells, as discrete and persistent entities. Among the highest detectable candidates in human cells, we validated six of them, which indeed produced a miRNA with variable dependence on known factors of miRNAs pathways. We employed several independent approaches to characterize the miRNA of one of these new SID that is hosted in the first intron of SRA (Steroid Receptor RNA activator). Of particular interest, we already showed that alternative splicing serves to increase the transcriptional output of the SRA1 gene, by producing a protein-coding gene when this intron is excised or a functional non-coding RNA when this intron is retained, disrupting the open reading frame (ORF), concomitant with myogenic differentiation (30,31). We now show that splicing of intron 1 of SRA generates a new miRNA with high levels in undifferentiated myogenic progenitors. As a whole, this work represents the first experimental validation of human endogenous SID and identified new classes of small ncRNAs of intron origin. It also puts forward that alternative splicing might represent an attractive source of small regulatory RNAs compatible with a tissue-specific or differentiation-stage specific role, which will provide new actors or biomarkers deregulated in disease.

MATERIALS AND METHODS

Data mining and data sets

Datasets of ‘Human Introns’, ‘Alternative Events’, ‘CpG island’, ‘SNP’ and ‘Genes’ were retrieved from the UCSC table browser, using ‘Genes and gene prediction’, ‘Alt events’, ‘Regulation’ and ‘RefSeq Genes’ tracks from the hg19 assembly covering the whole human genome. Using Galaxy and ‘Operate on Genomic Intervals’ tool, we selected introns shorter than 200 nucleotides and 100% covered in ‘Alternative Events’ datasets. Location of intron retention relative to the start and stop codons was obtained by genomic coordinate comparison (from ‘Genes’ dataset). Palindrome formation of selected introns was assessed using ‘palindrome search’ tool (parameters: minimum length of palindrome: 18; maximum length of palindrome: 28; Maximum gap between repeated regions: >100; number of mismatches allowed: <6). Finally, putative SID (Short Intron-Derived ncRNA) and non-SID introns were manually curated to remove potential redundancy. All excluded introns (1713), i.e. which did not present the ability to form a palindrome, were used to construct the NI (normal introns) dataset. The miRNA dataset (1426) was downloaded from miRBase (http://www.mirbase.org/; release 17), the snoRNA dataset (402) was downloaded from snoRNABase (https://www-snorna.biotoul.fr; v3) and a random dataset (10000) was generated using the Random DNA Sequence Generator software. Data for small RNA-seq was obtained from ENCODE/Cold Spring Harbor Lab track from the UCSC table browser (GEO DataSet number GSE24565; part of the BioProject 30709) and consist in NextGen sequencing information for RNAs between 20–200 nt in size, isolated from RNA samples from tissues or sub-cellular compartments of cell lines (K562, GM12878 and prostate cells).

Datasets analysis

All tools were from The European Molecular Biology Open Software Suite EMBOSS. GC content was calculated using ‘infoseq’; CpG islands were mapped using ‘CpG islands’ track from UCSC browser. Single Nucleotide Polymorphisms were searched using the dbSNP132 track from UCSC browser. RNALfold, which calculates local stable secondary structures of RNAs, was used with default parameters. Minimum free energies (MFE, kcal/mol) were corrected relative to the size of the transcripts.

Software and databases

Software and web servers used were as follows: The Mfold web server v2.3, http://mfold.rna.albany.edu; UCSC genome Browser, http://genome.ucsc.edu; Galaxy, http://main.g2.bx.psu.edu; Genomatix Software Suite v2.1, http://www.genomatix.de; Venny, http://bioinfogp.cnb.csic.es; Random DNA sequence generator, http://users-birc.au.dk; miRBase v17; http://www.mirbase.org; snoRNABase v3, https://www-snorna.biotoul.fr; miRNA Primer Design Tool, http://genomics.dote.hu:8080/mirnadesigntool; RNALfold, http://www.tbi.univie.ac.at.

Cell culture

Primary human satellite cells (LHCN-M2, myoblast, MB) and their in vitro differentiated myotubes (MT) counterpart, HEK-293, K562, MCF-7 and MDA-MB-231 cells were grown as previously described (30,31). RNA from primary human satellite cells isolated from muscle biopsies of three foetuses showing clinical symptoms of the congenital Myotonic Dystrophy type 1 (DM1), and from one immortalized cell line (DM11), all carrying more than 2000 CTG repeats, were a gift from Denis Furling (Myology Institute, Paris, France) and were used previously (31,32).

Antibodies

Primary antibodies used were directed against Drosha (ab12286, Abcam), DGCR8 (ab90579, Abcam), DBR1 (sc-99369, Santa Cruz), snRNP70 (sc-9571 c-18, Santa Cruz), PRP8 (sc-55534 F-6, Santa Cruz), U2AF65 (sc-48804 H-300, Santa Cruz), Dicer (sc-56651), Ago2 (sc-32877 H-300, Santa Cruz) and γ-Tubulin (T6557, Sigma).

Plasmids and constructs

pSicoR human Dicer1 (ID 14763), pSicoR human Drosha1 (ID 14766) and pSicoR human DGCR8-1 (ID 14769) were requested from Addgene (www.addgene.org). Short hairpin RNAs (shRNA) directed against human DBR1 (DBR1-1, DBR1-2 and DBR1-3), snRNP70+PRP8, U2AF65+PRP8, Ago2 (Ago2-1 and Ago2-2) and Luciferase were produced using MessageMuter™ shRNA production kit (Epicentre Biotechnologies) and in vitro transcribed using T7 RiboMax large-scale production system (Promega) following manufacturer's instructions. Sequences are depicted in Supplemental Table S2. Knockdown of snRNP70 or U2AF65 was combined to that of the major splicing factor PRP8, since interference of snRNP70 or U2AF65 alone was insufficient to reduce splicing significantly in another study (18).

RNA preparation

Total RNA was isolated using Trizol reagent (Life Technologies) according to the manufacturer's instructions and as previously described (31,33). Short and long RNAs were purified using mirVana™ miRNA Isolation Kit (Life Technologies) according to instructions.

Oligonucleotide array

Oligonucleotides, 45–55 nt in length and complementary to candidate SID, at a final concentration of 10 μM, were spotted on GeneScreen Plus (PerkinElmer) membranes using a 96 well manifold system (BioRad). To confirm the specificity of hybridization, we included a series of control oligonucleotides on the array. Briefly, RNU6-1 (U6) oligonucleotide served as loading control, two randomly-generated oligonucleotides were used to set the baseline signal and two oligonucleotides matching constitutive exon 4 of GAPDH and exon 4 of SRA mRNA were used as size fractionation controls. All oligonucleotides were first phosphorylated using 1U Polynucleotide kinase (New England Biolabs) per microgram of oligonucleotide for 2h at 37°C followed by enzyme inactivation by addition of 1 vol. of 0.5 M EDTA. Membranes were briefly washed once with 1× MOPS–NaOH pH 7 and once with 0.1× SSC, 0.125 N NaOH. Oligonucleotides were denaturated in 500 mM NaOH before being diluted in 0.1× SSC, 0.125 N NaOH and spotted onto membranes. Membranes were further washed again twice in 1× MOPS–NaOH pH 7. Spotted oligonucleotides were cross-linked to the membrane using 1-ethyl-3-(3-dimethylaminopropyl) carbodiimide (EDC) method described in (34). Arrays were not stored but used directly after spotting. Five micrograms of small RNAs freshly isolated from cell lines, transfected or not with shRNA, were preheated at 80°C for 3 min, cooled on ice, and labelled overnight at 37°C using 1 U poly (A) polymerase and 50 μCi 32P-α-ATP in the presence of 40 U RNAse Out (Life Technologies). Membranes were pre-hybridized in Ultrahyb®-oligo (Life Technologies) at 42°C for at least 30 min followed by an overnight hybridization in the same solution containing the RNA probe. Following hybridization, membranes were washed twice with 2× SSC/0.5% SDS at 42°C and once at 1× SSC/0.5% SDS. Membranes were exposed to a phosphor storage screen, scanned using Phosphor Imager (Fujifilm), and hybridization signals were quantified using Multi Gauge v3 (Fujifilm). Membranes were never stripped and used only once. Hybridization signals for each spot of the array and background values at eight empty spots were measured. Averaged hybridization signal at empty spots was subtracted from each other spot signal and normalized using the U6 signal. Signals from spots with random oligonucleotides were averaged and served as baseline value. Signals from candidate SID from among duplicate spots and triplicate arrays were averaged and considered positive when being over the baseline value. Hybridization with RNA from the different cell types was performed three times (n = 3), or four times (n = 4) following RNA interference in K562 cells.

Northern blot

RNA northern blot using short RNA fractions were performed as described by Pall and Hamilton (34) using 5 μg of small RNAs purified as described above and probed with the 32P-γ-ATP radio-labelled oligonucleotides used in the macro-array.

Pocket-sized RNA-seq

The method was fully described in reference (35). Sequences used to produce human SRA intron 1 or luciferase baits are provided in Supplemental Table S2.

RT-PCR, RT-qPCR and stem–loop RT-qPCR

RNA was isolated as described above and reverse transcribed as described previously (30,31,33,36,37). Primers used for amplification are described in Supplemental Table S2. Radio-labeled PCR products were separated on denaturating poly-acrylamide/urea gels as previously described (38). Following electrophoresis, the gels were dried and exposed to a phosphor storage screen, and scanned as above (n = 2). Quantitative PCR was carried out using the LightCycler® 480 SYBR Green I master on the Light cycler 480 real-time PCR system (Roche Diagnostics). For stem–loop RT-qPCR, reverse transcriptase reactions were performed replacing random hexamer primers by stem–loop primers. All reactions, including no-template controls and RT-minus controls, were run in duplicate (n = 3). Relative expression of genes was calculated by using the comparative CT method (30,31), i.e. 2−((shMock sample – shMock control) – (shTarget sample – shTarget control)). Expression of immunoprecipitated RNAs was obtained by calculating the ‘relative to the irrelevant antibody’ method, i.e. 2−(Ct specific Ab – Ct irrelevant Ab) and was expressed as fold enrichment (n = 2). Mature miRNA-specific primers were designed using miRNA Primer Design Tool Website and normalization was performed relative to the signal of RNU6-1 (U6) amplification. All primers used in qPCR experiments had efficiencies >98%, much more above the MIQE requirement.

RNA and protein immunoprecipitation

Cells were lysed, RNA and proteins were extracted and quantified as described previously (31) and used for native RNA immunoprecipitation (RIP) or protein immunoprecipitation experiments, respectively. Protein extracts (1 mg) were incubated with the appropriate antibody (1 μg/mg of proteins) for 2 h at 4°C as described earlier (33,36). Total proteins were analyzed by SDS-PAGE and immunoblotting as previously described (36) or when appropriate, co-precipitated RNA was extracted using Trizol method, reverse transcribed and PCR amplified as described above (n = 2).

Transfection experiments

Plasmid constructs and shRNA were transiently transfected (n = 3) using jetPEI™ and interferin reagents (PolyPlus transfection), respectively, following the manufacturer's instructions and as previously described (31).

RESULTS

Bioinformatic prediction of candidate Short Intron-Derived ncRNA (SID)

In order to systematically identify SID in the human genome database we used the workflow outlined in the Material and Methods section and in Supplementary Figure S1, starting from 326, 537 introns extracted from UCSC table browser. Since 100% of pre-miRNA hairpins (calculated from miRBase) and 95% of snoRNA (calculated from snoRNABase) are not >200 nt in length, our computational strategy retained only introns <200 nt but >50 nt so they can form a hairpin compatible with the size of a pre-miRNA or a snoRNA. This restriction has also been chosen for further experimental validation after RNA size fractionation that imposes a cut-off around 150–200 nt. Although splicing of introns is a prerequisite to produce mirtrons and snoRNAs, we reasoned that alternative splicing events would be of prime interest to account for their regulatory functions during cellular differentiation or to integrate external or environmental signals. Therefore, the 28 118 extracted introns of 50–200 nt in size were further confronted with 92 222 alternative splicing events from UCSC (39), of which 6450 represent retained introns. This first step led to the identification of 2147 short introns subjected to alternative splicing. In addition to effective splicing and intron length, structural characteristics are also important to define a SID. We thus selected 218 unique introns shorter than 200 nucleotides, which contained inverted repeats necessary for the formation of a stem–loop structure. These short hairpin introns, unlike typical introns that are rapidly degraded following splicing (40), are much more stable in the cell. Thus, out of these 218 introns, we retained introns that matched collections of small RNAs of 20–200 nt in length isolated from tissues or sub-cellular compartments from ENCODE cell lines (http://genome.ucsc.edu/ENCODE/) and deep sequencing of small RNA-seq samples from ENCODE project (GEO Series accession number GSE24565), potentially containing both discrete and stable introns or mature miRNAs. Overall, we identified 56 introns that are compatible in length, structure and sequence, with the formation of SID hairpins and for which small RNAs exist in small RNAs databases (Figure 1 and Supplementary Table S1).
Figure 1.

typical example of SID candidates. (A) Schematic representation of the human LINC00085 gene located on chromosome 19 (chr19:52, 196, 593-52, 208, 443 described as SID #39 in Supplementary Table S1). The sixth intron was identified as a retained intron in database. In this particular example, the gene considered was described as a non-coding RNA (NR_024330). (B) Schematic representation of the human SETD6 gene located on chromosome 16 (chr16:58, 549, 383-58, 554, 456 described as SID #8 in Supplementary Table S1). The second intron was identified as a retained intron in database. In this particular example, the two transcript variants 1 and 2 have been already described as NM_001160305 (containing the intron in frame with the ORF) and NM_024860, respectively. (C) Schematic representation of the human LTC4S gene located on chromosome 5 (chr5:179, 220, 986-179, 223, 512 described as SID #47 in Supplementary Table S1). Although its four introns were described as retained introns in database, only the second intron was compatible in size. (A–C) A zoom capture is presented below each gene map including small RNA-seq data obtained from ENCODE in three cell lines (GM12878, K562 and prostate cells and fractionated data when available). Sequence and structure determined using RNALfold are also indicated.

typical example of SID candidates. (A) Schematic representation of the human LINC00085 gene located on chromosome 19 (chr19:52, 196, 593-52, 208, 443 described as SID #39 in Supplementary Table S1). The sixth intron was identified as a retained intron in database. In this particular example, the gene considered was described as a non-coding RNA (NR_024330). (B) Schematic representation of the human SETD6 gene located on chromosome 16 (chr16:58, 549, 383-58, 554, 456 described as SID #8 in Supplementary Table S1). The second intron was identified as a retained intron in database. In this particular example, the two transcript variants 1 and 2 have been already described as NM_001160305 (containing the intron in frame with the ORF) and NM_024860, respectively. (C) Schematic representation of the human LTC4S gene located on chromosome 5 (chr5:179, 220, 986-179, 223, 512 described as SID #47 in Supplementary Table S1). Although its four introns were described as retained introns in database, only the second intron was compatible in size. (A–C) A zoom capture is presented below each gene map including small RNA-seq data obtained from ENCODE in three cell lines (GM12878, K562 and prostate cells and fractionated data when available). Sequence and structure determined using RNALfold are also indicated.

Structural features of candidate Short Intron-Derived ncRNA (SID)

We then analyzed structural features of the putative 56 new human SID compared to the excluded normal introns (NI) and to annotated pre-miRNAs and snoRNAs extracted from miRBase v17 and snoRNABase v3, respectively. We showed that human SID exhibit a higher G+C content (Figure 2A) and preferentially reside within CpG island regions (Figure 2B). Furthermore, the stability of their secondary structure is higher than that of NI, snoRNAs (41) or random sequences as determined using RNALfold, and is comparable to that of annotated pre-miRNAs (Figure 2C). Altogether, these findings suggest that formation of a pre-miRNA-like structure is not a common feature of introns but is a structural characteristic of both canonical pre-miRNAs and SID. Interestingly, searching for genomic common single-nucleotide polymorphism (SNP), we found that the mean number of SNP per intron is almost three-times higher in NI than in SID (Figure 2D). These results are consistent with a higher selective pressure on SID than on NI, equivalent to that on pre-miRNA (42) and snoRNAs (43), and suggestive of the functional importance of these sequences.
Figure 2.

Structural and sequence features of newly identified SID. (A) G+C content of NI (introns; n = 1713), SID (n = 56), pre-miRNAs (from miRBase v17; n = 1426), snoRNAs (sno, from snoRNABase v3; n = 402), and random sequences (generated as described in Materials and Methods; n = 10 000), all with similar nucleotide length, were analyzed and plotted as box-and-whisker diagrams. (B) Percentage of sequences that mapped in an annotated CpG island region. (C) Corrected MFE (minimum free energy) was calculated for each sequence by dividing the absolute value obtained from RNALfold program (see Materials and Methods) by the corresponding sequence length. Data were plotted as box-and-whisker diagrams. (D) Total number of SNP was calculated for each dataset using Galaxy tools and the dbSNP132 track and a mean of SNP per sequence was then plotted. (A and B) * mean P < 0.001 (Student's t-test). (C and D) Statistical analyses were performed using Chi-squared test.

Structural and sequence features of newly identified SID. (A) G+C content of NI (introns; n = 1713), SID (n = 56), pre-miRNAs (from miRBase v17; n = 1426), snoRNAs (sno, from snoRNABase v3; n = 402), and random sequences (generated as described in Materials and Methods; n = 10 000), all with similar nucleotide length, were analyzed and plotted as box-and-whisker diagrams. (B) Percentage of sequences that mapped in an annotated CpG island region. (C) Corrected MFE (minimum free energy) was calculated for each sequence by dividing the absolute value obtained from RNALfold program (see Materials and Methods) by the corresponding sequence length. Data were plotted as box-and-whisker diagrams. (D) Total number of SNP was calculated for each dataset using Galaxy tools and the dbSNP132 track and a mean of SNP per sequence was then plotted. (A and B) * mean P < 0.001 (Student's t-test). (C and D) Statistical analyses were performed using Chi-squared test. Our approach was further validated with the finding that previously identified human mirtrons (44) like pre-miR-1234 (SID #44) and pre-miR-6743 (SID #26) were present in our data set (Supplementary Table S1). Moreover, two of the identified host genes, TCIRG1 (T-Cell, Immune Regulator 1, ATPase, H+ Transporting, Lysosomal V0 Subunit A3) hosting SID #12 and CPSF1 (Cleavage and polyadenylation specificity factor 1) hosting SID #44 and #46, were also host genes for agotrons (19).

Experimental validation of SID candidates

To test whether the 56 potentially new human SID could be detected as discrete entities, we developed low-density oligonucleotide arrays hybridized with RNA extracted from cultured cells and fractionated in size (Supplementary Figure S2). Oligonucleotides matching the 56 SID were spotted as duplicates. Positive controls for hybridization were designed to detect highly expressed RNU6-1 snRNA (U6) and the hsa-miR-21 expressed in most tumour cell lines. Negative controls consisted of randomly chosen oligonucleotides. Oligonucleotides matching exons of housekeeping mRNAs served as controls for size fractionation since exon-containing mRNAs should not be found in the small RNA fraction. Arrays were hybridized using radio-labelled RNA probes extracted from six different human cell types and fractionated to retain RNAs shorter than 200 nt in length. Hybridization signals were quantified and normalized as described in the Material and Methods section. Out of the 56 candidates, 36 showed signals above baseline levels in at least one cell type, representing our best candidates as new human SID (Supplementary Table S1). Although SID candidates were selected using a specific workflow that integrates alternative splicing events, we first validated expression of their host gene as well as alternative splicing of their RNA precursors as measured by detection of intron-retaining or -spliced isoforms, using radioactive RT-PCR (Supplementary Figures S3 and S4). All but 3 (#1, #27, #47) of the 36 SID candidates showed retention and excision events, even weak, in at least one of the six chosen cell lines. Candidates #6 and #8 showed intermediate amplification products in all cell lines studied and #36 showed intermediate amplification products in MDA-MB-231 cells (Supplementary Figures S3), consistent with alternative acceptor or donor splicing sites already reported for candidates #6 and #8, respectively (http://fasterdb.ens-lyon.fr). Cloning and sequencing of candidate #36 intermediate product also identified an alternative non-canonical (GT→GA) donor splicing site (Supplementary Figures S5). As a whole, 33 SID (Supplementary Table S1) were detectable in at least one of the six cell lines, and showed variable levels consistent with introns originating from alternative splicing events being cell-type- or differentiation-stage- specific. For example, SID #10, #24 and #39 showed higher levels in ERalpha-negative, highly invasive, fibroblast-like MDA-MB-231 than in ERalpha-positive, weakly invasive, luminal epithelial-like MCF-7. In contrast, SID #42 showed higher levels in MCF-7 cells than in MDA-MB-231. Similarly, SID #39 and #43 showed higher levels in undifferentiated human myoblasts than in their differentiated counterpart, whereas SID #8 and #53 increased during myoblast differentiation. Some SID, such as #10, #16, #24 and #39, were detected in all six cell types at high levels. In contrast, high levels of #42 and #46 were a hallmark of cancer cells whatever the origin of the tumour, whereas levels of #21 were low in cancer cell lines compared to normal myogenic cells.

Biogenesis of miRNAs from SID candidates

In order to decipher the pathways involved in the biogenesis of miRNAs from SID, we used the 19 candidates that showed signals above baseline levels in the ENCODE common cell line K562, in which we knocked down all major players involved in miRNAs, mirtrons, simtrons and snoRNAs biogenesis, using RNA interference (see Material and Methods section and Supplementary Table S2). Size fractionated RNAs extracted from K562 were radio-labelled and then used as probes on low-density oligonucleotide arrays to assess the impact of knock down assays on individual SID levels. RT-PCR and western blots controlling efficiency of RNA interference are shown in Supplementary Figure S6A and B, respectively. We used shRNA directed against luciferase gene as a mock control (Supplementary Table S2). We first performed experiments to knock down Dicer and Ago2 (Figure 3). In addition to controls for size fractionation already mentioned earlier for the experimental validation of SID candidates, the increased signal following Dicer knockdown suggests that only SID, i.e. precursors of smaller ncRNAs that accumulate as the result of their impaired processing into smaller ncRNAs, are detectable by the method. For all SID candidates but 2 (#36 and #47), we observed an accumulation of SID in conditions of reduced levels of Dicer protein (Figure 3), suggesting that Dicer is indeed involved in SID maturation into smaller RNAs as shown for canonical miRNAs. As expected, decreased expression of Ago2 did not affect levels of most SID, although increased levels of SID #36 might be explained by the previously described dicing activity of Ago2 that can replace Dicer in pre-miRNA processing (24,25).
Figure 3.

Role of Dicer and Ago2 in biogenesis of SID. Oligonucleotide arrays developed in Supplementary Figure S2, and spotted with the SID that shows a signal above baseline in K562 cells (Supplementary Table S1) were hybridized with radio-labeled small RNA isolated from K562 cells in which Dicer, Ago2 or Luc were knocked down (n = 4). Hybridization signals were plotted as a fold-over baseline signal as described in Material and Methods section. Luc, control luciferase knockdown; Dcr, Dicer knockdown; Ago, Ago2 knockdown. The hsa-miR-21 (miR21) was used as positive control. Significant differences were assessed using Student's t-test (*P < 0.05). Error bars represent standard error at the mean (SEM).

Role of Dicer and Ago2 in biogenesis of SID. Oligonucleotide arrays developed in Supplementary Figure S2, and spotted with the SID that shows a signal above baseline in K562 cells (Supplementary Table S1) were hybridized with radio-labeled small RNA isolated from K562 cells in which Dicer, Ago2 or Luc were knocked down (n = 4). Hybridization signals were plotted as a fold-over baseline signal as described in Material and Methods section. Luc, control luciferase knockdown; Dcr, Dicer knockdown; Ago, Ago2 knockdown. The hsa-miR-21 (miR21) was used as positive control. Significant differences were assessed using Student's t-test (*P < 0.05). Error bars represent standard error at the mean (SEM). We then used shRNAs directed against Drosha and DGCR8 that compose the microprocessor (45), snRNA70+PRP8 and U2AF65+PRP8 that are components of the splicing machinery (18), and DBR1 that debranches intron lariats (46) (Figure 4). The 19 SID showed distinct dependency for one to several of these proteins, allowing to classify them in five different categories: (i) the members of the first category correspond to classical pre-miRNAs and showed dependency on the microprocessor only. It included pre-miR-21, used here as a control, and SID #16, #18, #19 and #20; (ii) the second category was composed of one mirtron candidate (#8), as it showed dependency on the debranching enzyme DBR1 but not on the microprocessor components or snRNP70+PRP8 and U2AF65+PRP8; (iii) SID #45 was dependent on Drosha and snRNP proteins, as shown previously for simtrons (17,18); (iv) a new category of candidates, including #24, #39, #42, #43, #46, #47 and #52, showed dependency on both the microprocessor (Drosha and DGCR8) and DBR1; (v) unclassified SID for which no dependency could clearly and/or significantly be determined, corresponding to SID #10, #21, #33, #36, #38 and #50.
Figure 4.

Role of Drosha, DGCR8, snRNP70+PRP8, U2AF65+PRP8 and DBR1 in biogenesis of SID. Oligonucleotide arrays developed in Supplementary Figure S2, and spotted with the SID that shows a signal above baseline in K562 cells (Supplementary Table S1) were hybridized with radio-labelled small RNA isolated from K562 cells in which Drosha, DGCR8, snRNP70+PRP8, U2AF65+PRP8, DBR1 or Luc were knocked down (n = 4). Hybridization signals were plotted as a fold-over baseline signal as described in Material and Methods section. Luc, control luciferase knockdown; Ds, Drosha knockdown; Dg, DGCR8 knockdown; 1+8, snRNP70+PRP8 knockdown; 2+8, U2AF65+PRP8 knockdown; Db, DBR1 knockdown. The hsa-miR-21 (miR21) was used as positive control. Significant differences were assessed using Student's t-test (*P < 0.05). Error bars represent standard error at the mean (SEM).

Role of Drosha, DGCR8, snRNP70+PRP8, U2AF65+PRP8 and DBR1 in biogenesis of SID. Oligonucleotide arrays developed in Supplementary Figure S2, and spotted with the SID that shows a signal above baseline in K562 cells (Supplementary Table S1) were hybridized with radio-labelled small RNA isolated from K562 cells in which Drosha, DGCR8, snRNP70+PRP8, U2AF65+PRP8, DBR1 or Luc were knocked down (n = 4). Hybridization signals were plotted as a fold-over baseline signal as described in Material and Methods section. Luc, control luciferase knockdown; Ds, Drosha knockdown; Dg, DGCR8 knockdown; 1+8, snRNP70+PRP8 knockdown; 2+8, U2AF65+PRP8 knockdown; Db, DBR1 knockdown. The hsa-miR-21 (miR21) was used as positive control. Significant differences were assessed using Student's t-test (*P < 0.05). Error bars represent standard error at the mean (SEM).

Physical interaction of SID candidates with key factors involved in biogenesis of miRNAs

Because knock down experiments could lead to indirect effects, and to test whether proteins involved in these pathways act directly on the processing of SID into miRNAs (Figure 5), we performed native RIP to detect interactions of a given SID with proteins implicated in the canonical miRNA pathway (microprocessor proteins Drosha and DGCR8), in intron-lariat debranching (DBR1) or in splicing (snRNP70, component of the U1 complex; U2AF65, component of the U2 complex). We also assessed the interaction of SID with Ago2, which is a prerequisite to suggest loading into the RISC complex and hence, to suggest a regulatory function for the resulting miRNA. Western blots controlling efficiency of immunoprecipitation assays are shown in Figure 5A. Results shown in Figure 5B are expressed as a fold enrichment relative to immunoprecipitation using an irrelevant antibody.
Figure 5.

SID interactions with miRNA pathway key factors. (A) Native RIP was performed using the indicated antibodies and the effective protein precipitation was confirmed by western blotting. IP, immunoprecipitation; in, 5% input. (B) Co-precipitated RNAs were extracted as described in Material and Methods section (n = 2). Quantitative PCR were performed using specific primer pairs (Supplementary Table S2) to amplify SID #8, #16, #18, #19, #20, #24, #39, #43, #45 and #46. Amount of immunoprecipitated material is expressed as fold enrichment compared to irrelevant antibody. Ds, Drosha; Dg, DGCR8; Ago, Ago2; Dbr, DBR1; U1, snRNP70; U2, U2AF65. Significant differences with irrelevant antibody were assessed using Student's t-test (*P < 0.05). Error bars represent standard error at the mean (SEM).

SID interactions with miRNA pathway key factors. (A) Native RIP was performed using the indicated antibodies and the effective protein precipitation was confirmed by western blotting. IP, immunoprecipitation; in, 5% input. (B) Co-precipitated RNAs were extracted as described in Material and Methods section (n = 2). Quantitative PCR were performed using specific primer pairs (Supplementary Table S2) to amplify SID #8, #16, #18, #19, #20, #24, #39, #43, #45 and #46. Amount of immunoprecipitated material is expressed as fold enrichment compared to irrelevant antibody. Ds, Drosha; Dg, DGCR8; Ago, Ago2; Dbr, DBR1; U1, snRNP70; U2, U2AF65. Significant differences with irrelevant antibody were assessed using Student's t-test (*P < 0.05). Error bars represent standard error at the mean (SEM). As anticipated from data described above, SID #16, #18, #19 and #20 were significantly associated with Drosha and DGCR8, but not with DBR1 nor snRNP70 or U2AF65. Conversely, SID #45, which we classified as a simtron-like, showed a significant association with Drosha and snRNP70 as expected (17,18). SID #43, which we have demonstrated the dependence on Drosha and DBR1 by RNA interference (Figure 4), was indeed associated with these two proteins. However, SID #8 that we classified as a mirtron according to its dependency on DBR1 but not on the microprocessor (14), unexpectedly co-precipitated with Drosha and snRNP70 as well as with DBR1. Because of the direct interactions described above, SID #8, that we defined as a mirtron, should be referred to as a mirtron-like RNA since it also interacts with snRNP70, which has never been described in the case of mirtrons. Similarly, SID #45, which we classified as a simtron, should be referred to as a simtron-like because of its interaction with Ago2 that is dispensable in the simtron pathway (17). Of note, it is not so surprising to find the full SID associated with the Ago2 complex since Dicer, Ago2 and TAR RNA-binding protein 2 (TRBP2) that compose the RISC-loading complex (RLC) are first loaded together on the stem–loop before cleavage of the loop by Dicer and incorporation of the guide-strand into the RISC complex and repression of the target RNAs (47–50). Data from Figures 3, 4 and 5 relative to the biogenesis of SID candidates (interference experiments and native RIP assays) are summarized in Supplementary Table S3 in which we propose a categorization of SID candidates.

Mature miRNAs originating from SID candidates

In order to confirm that SID candidates are genuine precursors of smaller ncRNAs, in particular mature miRNAs, we developed northern blot dedicated to the detection of endogenous small RNAs (see Material and Methods section). We have arbitrarily selected SID candidates from each category described above, i.e. #8, #10, #18, #43, #45 and #46. As shown in Figure 6, northern blot analyses further revealed that endogenous miRNAs originating from these six SID were indeed detectable in the small fraction of RNAs. These results strongly suggest that these introns meet the requirements to be classified as new human SID. For reasons that are not clear yet, the intermediate product corresponding to the hairpin (pre-miRNA) has never been detected in this study, except for SID #8 and SID #43. Likewise, many pre-miRNAs have not been detected in other studies either, like for hsa-miR-148a (51), hsa-miR-17 (52) or mmu-miR-103 (53). It must also be emphasized that only the mature small RNA with the expected size for miRNAs was detected using a probe against SID #45 that we classified as a simtron-like. Indeed, in the case of simtrons, the primary sequence (pri-miRNA) does not correspond to the intron alone but to the pre-messenger RNA (which is larger than 200 nt and therefore eliminated from the size fractionation).
Figure 6.

Small RNA isolated from HEK-293 cells were analysed by northern blot using specific radio-labelled probes complementary to SID #8, #10, #18, #43, #45 and #46. The corresponding acrylamide gels stained with ethidium bromide are shown. The upper and lower bands corresponding to the SID and its miRNA product, respectively, are indicated by small arrows, and the intermediate product for #8 and #43 is indicated by an arrow-head.

Small RNA isolated from HEK-293 cells were analysed by northern blot using specific radio-labelled probes complementary to SID #8, #10, #18, #43, #45 and #46. The corresponding acrylamide gels stained with ethidium bromide are shown. The upper and lower bands corresponding to the SID and its miRNA product, respectively, are indicated by small arrows, and the intermediate product for #8 and #43 is indicated by an arrow-head.

Characterization of a new mature miRNA processed from SID #43

SID #43 corresponds to intron 1 of SRA1 (Steroid Receptor RNA Activator 1) gene for which we already reported the importance of alternative splicing in other contexts (30,31,38,54). In order to clone and sequence the exact mature sequence of the miRNA processed from intron 1 of SRA1, we used the method that we recently developed (35) based on in vitro transcription, RNA pull down and adapted RACE-PCR techniques. As shown in Figure 7A, we were able to characterize the mature SRA miRNA, allowing to asses its expression profile in myoblasts and myotubes using stem–loop RT-qPCR. We used hsa-miR-1224 and hsa-miR-199a2 as positive controls. During myogenic differentiation of human healthy myoblasts to myotubes, we observed the previously reported increase in miR-199a2 during myogenic differentiation (55). Consistent with the previously described decrease in SID #43 levels during the course of myoblast differentiation (Supplementary Table S1), levels of the mature miRNA derived from SID #43 also decreased as measured by stem–loop RT-qPCR (Figure 7B). It is worth noting that the decrease in both SID #43 and mature miRNA levels was consistent with the increased retention of intron 1 in long SRA isoforms that we recently reported during this process (31). We also detected the mature miRNA that derive from SID #43 in cells isolated from DM1 patients (see Material and Methods) by stem–loop RT-qPCR (Figure 7B). However, and in contrast to what was observed in muscle satellite cells from healthy donors, we did not detect changes in the levels of mature SRA miRNA, consistent with the previously identified defect in alternative splicing of SRA intron 1 in DM1 cells (31).
Figure 7.

A novel miRNA is produced from SID #43 (SRA intron 1). (A) Out of the 16 clones that were sequenced, the six corresponding to SRA intron 1 are positioned. (B) Relative expression of indicated mature miRNA amplified by stem loop RT-qPCR in undifferentiated or differentiated human myogenic cells (n = 3). 1224, mature miR-1224; 199a2, mature miR-199a2; SRA, mature SRA-miRNA; exp, expression; Healthy, cells from healthy control; Patients, cells from DM1 patients (n = 3). Significant differences were assessed using Student's t-test (*P < 0.05). Error bars represent standard error at the mean (SEM).

A novel miRNA is produced from SID #43 (SRA intron 1). (A) Out of the 16 clones that were sequenced, the six corresponding to SRA intron 1 are positioned. (B) Relative expression of indicated mature miRNA amplified by stem loop RT-qPCR in undifferentiated or differentiated human myogenic cells (n = 3). 1224, mature miR-1224; 199a2, mature miR-199a2; SRA, mature SRA-miRNA; exp, expression; Healthy, cells from healthy control; Patients, cells from DM1 patients (n = 3). Significant differences were assessed using Student's t-test (*P < 0.05). Error bars represent standard error at the mean (SEM).

Prediction of target genes of mature miRNAs produced from SID #8 and SID #43

Because the prediction of a stem–loop secondary structure is not sufficient to predict a miRNA seed sequence without cloning and sequencing of the mature miRNA as performed above for SRA miRNA (from SID #43), we mined the large amount of publically available small RNA-seq data to identify the most likely sequence for a mature miRNA produced from candidate SID. It was only in the case of SID #8 that we found enough sequencing reads to delineate a miRNA sequence (Supplementary Figure S7). Using sequences of miRNAs from SID #43 characterized above and #8 inferred from RNA-seq data, we predicted target genes as described in the legend of Supplementary Figure S8 and assessed their expression levels using RNA-seq data from human cell lines. Interestingly, in cell lines expressing high levels of the candidate mature miRNAs originating from SID #8 (MCF7) or SID #43 (MB), expression levels of the predicted target genes were lower than in cell lines in which the miRNA was not expressed or at low levels (HUVEC for SID #8 and MT for SID #43). Such anti-correlated expression profiles between miRNAs and their predicted target genes suggest that miRNAs produced from candidate SID identified in this study are genuine regulatory small ncRNAs.

DISCUSSION

Alternative splicing is the first phenomenon that made scientists realise that genomic complexity is not proportional to the number of protein-coding genes. More recently, advances in high throughput transcriptome analysis highlighted the pervasive nature of transcription of mammalian genomes and revealed an ever-growing number of non protein-coding yet functional transcripts, which amount parallels that of genomes complexity (6,56). Furthermore, the recent characterization of RNAs for which both coding capacity and activity as functional RNAs have been reported, adds an additional degree of complexity (30,31,57,58). We recently proposed that intron retention contributes to the diversification of the information carried by genes by producing functional RNA instead of a protein product (58). This concept was exemplified by SRA1, for which transcripts exist as coding and non-coding isoforms, through alternative splicing of the first intron (30,31). Adding to the complexity, a number of functional ncRNAs are known to be hosted within introns (59). Although many of them can be produced independently of the transcription of their host gene, others are produced in a post-splicing manner, suggesting that alternative splicing might also serve to produce regulatory RNAs (60). For example, snoRNAs and miRNAs can originate from the processing of debranched introns after their excision from the pre-mRNA (16,59,61–63). Recent work reported that a subset of these spliced and debranched introns indeed has the ability to form hairpin structures compatible with their export to the cytoplasm and further processing into miRNAs by Dicer in the case of mirtrons (9,15), or bypassing Dicer in the case of agotrons (19). Thus, a single transcription unit could generate multiple molecules including proteins, long or smaller regulatory ncRNAs such as miRNAs, depending upon the need of the cell to respond to particular environmental conditions. We report here the bioinformatic prediction of SID RNAs, for which short introns produced by alternative splicing and their complementary miRNAs could be extracted from public databases. Owing to their splicing origin and key structural attributes, SID are indeed well suited for a relatively efficient bioinformatic prediction and distinction from the bulk of introns. Based on comparative genomics and computational search for conserved intronic hairpins, earlier studies identified a few human mirtron candidates (9,16). Here, we proposed a bioinformatic approach coupled with experimental validation to identify potential new human SID and characterize their biogenesis pathway. Although miRNAs could originate from constitutive or alternative splicing events, we decided to focus on the latter case. Even though less abundant and hence enabling easier experimental validation, this was intended to consider their potential contribution to cell fate and identity. Previous predictions of human mirtrons and intronic miRNAs were based on conservation and hairpin formation (9,11,13,14,16,44,64–70) but never on splicing events. Despite these approaches being different from the one proposed herein, we also computationally identified SID producing hsa-miR-6743 (#26) and hsa-miR-1234 (#44), previously described as mirtrons (66). Although levels of these SID were too weak for further experimental validation, we now show that they originate from an alternative splicing event of the transcripts RIC8A (Resistance to Inhibitors of Cholinesterase 8 homolog A) and CPSF1 (Cleavage and Polyadenylation Specificity Factor), respectively. CPSF1 gene encodes the largest subunit of the CPSF complex, a multi-subunit complex that plays a central role in 3΄ processing of pre-mRNA (71). The gene locus appears to be very complex in that it contains around 50 introns, of which, interestingly, 11 are alternatively spliced (72). We identified an additional SID produced by alternative splicing of CPSF1 (#46), which is distinct from the previously reported SID producing hsa-miR-1234 (9), hsa-miR-939 (16), and an agotron (19) also hosted in this gene. It is not surprising to find multiple SID in genes containing high numbers of introns, which in addition are subject to alternative splicing. Along the same lines, it is not surprising to find a higher density of SID on chromosome 19 (12/56) that contains the highest density of genes (8,73), but more importantly, the highest number of genes with short introns (http://www.ncbi.nlm.nih.gov/genome) compatible with their processing and formation of stable hairpin structures (29). Like CPSF1, chromosome 19 appears to host both SID and canonical miRNAs precursors as it contains the largest cluster of pre-miRNAs identified in mammals, where at least 46 different pre-miRNAs were found within a 100-kb region (74). We did not restrict our search to introns of protein-coding genes since the possibility that non-coding RNAs might also produce shorter RNAs has already been proposed (59,75,76) as exemplified by SNHG (Small Nucleolar RNA Host Gene) family of genes and GAS5 (Growth Arrest Specific 5) that are non-coding multi-snoRNAs hosting genes (77), or by MIR17HG also known as the miR-17/92 cluster (78). Consistent with this, we identified SID candidates #39, #16 and #43 respectively hosted in LINC00085 (long intergenic non-protein coding RNA 85, NR_024330), RHBG (Rh family, B glycoprotein, which variant 2 is a non-coding isoform, NR_026549) and SRA1 [which we previously described as both non-coding RNAs and mRNA isoforms, see ref. (30,31,38,54)]. Interestingly, RHBG, similarly to what was shown for SRA, could represent a new bifunctional RNA since retention of its intron disrupts the ORF whereas, as for classical alternatively spliced mRNA, splicing preserves the ORF (30,31,58). In both cases, we have now shown that alternative splicing of introns not only favours the production of a protein product, but also promotes the formation of a miRNA. Adding a degree of complexity, alternative donor and acceptor splicing sites might also contribute to the diversification of the RNA forms produced, i.e. mRNA, long ncRNA and/or short ncRNA, as it might be the case for SID #6, #8 and #36. Whether the shorter SID generated by alternative donor or acceptor sites can also produces a miRNA in certain cellular contexts remains to be tested. Although bioinformatic prediction of human SID eased their identification based on conservation, splicing origin or structural features [see refs. (9,11,13,14,16,44,64,65,67) and the present study], computational analysis is likely to generate large amounts of false positive predictions. Indeed, out of the first 19 mirtrons previously identified (9), 3 were experimentally tested and one of them (hsa-miR-1233 precursor) was suggested not to be a mirtron or a simtron (68). Thus, it stresses the need to experimentally test these SID candidates and characterize their biogenesis pathways. Our primary goal was to identify SID originating from alternative splicing events being cell-type- or differentiation-stage- specific. However, we identified intronic miRNAs, which biogenesis should not rely in theory on splicing mechanisms. Indeed, intronic miRNAs are either transcribed independently of, or share promoter with, their host gene (79–82), but in both cases transcripts that correspond to pri-miRNAs should be much longer than the intron alone. However, SID #18, which we defined as a miRNA, had the exact size of the spliced intron, suggesting that, at least for this candidate, splicing mechanism was a prerequisite to produce the corresponding intronic miRNA. Apparent discrepancy also exists if we consider the levels of SID-hosting transcripts isoforms (Supplementary Figures S3 and S4) and that of the SID candidates (summarized in Supplementary Table S1). However, the fact that only SID and not mature miRNAs are detected on the oligonucleotide arrays could explain this discrepancy for SID that are rapidly processed in a given cellular context. In addition, the stability and fate, and hence the detectable levels, of the spliced intron are independent of that of the longer transcripts from which they derived. Such example already exists with EGFL7 (epidermal growth factor domain 7) that hosts the hsa-miR-126 in intron 6. EGFL7 was shown to be highly expressed in lung and to a lesser extent in heart whereas the opposite was observed for hsa-miR-126 [(83) and http://www.microrna.org]. In agreement with splicing proposed to be required for the production of non-canonical precursors of miRNAs, we report the experimentally-tested dependence of several SID on components of the splicing machinery (snRNP70+PRP8, U2AF65+PRP8) and of the microprocessor (Drosha, DGCR8), as exemplified by SID #8, #43, #45 and #46. Several other studies have reported a crosstalk between spliceosome and microprocessor. One striking example comes from plants, along which production of ath-miR-163 in Arabidopsis thaliana is favoured by components of the snRNP70 that potentiate miRNA biogenesis by hiding the proximal polyA signal and thus prevent polyadenylation and premature cleavage/degradation of the pri-miRNA (84). Other examples of such interplay have been reported in mammals, but almost invariably represent a competition between components of the spliceosome and those of the microprocessor (61,84–86). Interestingly, Drosha and DGCR8 were found in association with the spliceosome machinery in a supraspliceosome context (87), where knockdown of Drosha resulted in increased splicing whereas inhibition of the spliceosome enhanced miRNA production (87). To date, the canonical pathway (Drosha/DGCR8XPO5DicerAgo2) is considered as the main biogenesis pathway for the majority of miRNAs (88). In the last few years, alternative pathways for miRNA biogenesis have also been reported. The first identified, the mirtron pathway, bypasses the cleavage by the microprocessor (9,15) since spliced-introns, first debranched by DBR1, are directly used as pre-miRNAs and exported in the cytoplasm by XPO5. The simtron pathway (17,18) uses the snRNP70 small nuclear ribonucleoprotein to recruit Drosha that will directly process the stem–loop structure from the pre-mRNA to produce the pre-miRNA. This mechanism is independent of DGCR8 that usually recognizes pri-miRNAs and directs Drosha RNAse III activity to release the pre-miRNA hairpin (89). We now report that these pathways are likely to be more complicated and inter-connected than previously expected, at least for miRNAs produced from short introns (SID). We described four intronic pre-miRNAs, 1 simtron-like and 1 mirtron-like, but also several other SID which biogenesis relies on both microprocessor (Drosha and DGCR8) and debranching enzyme (DBR1). Nonetheless, we did not exclude the possibility that miRNA originating from SID could be produced through multiple pathways, either depending of the host-gene expression/regulation, or relying on the intron excision/retention balance. Among the new SID that we experimentally validated, we focused on the interesting case of the miRNA contained in the first intron of SRA RNA. SRA RNA was first described as a non-coding RNA (90) that acts as a transcriptional co-activator of nuclear receptors [For recent reviews see refs. (91,92)]. Since then, we described new isoforms produced by alternative splicing (30,31,54). SRA RNA was then proposed as the first member of a new class of bifunctional RNAs since both a protein and a non-coding RNA can be produced from the same genetic entity (30,31,54). We now report that intron 1 of SRA is detectable as a discrete entity in all 6 cell types that we have tested, although less abundantly in differentiated primary myotubes. Besides, we have recently found that coding (spliced intron 1) SRA isoforms were more abundant in differentiated human myogenic cells whereas non-coding (intron 1 retaining) SRA isoforms increased during myogenic differentiation and enhanced myogenic differentiation and myogenic conversion of non-muscle cells through the co-activation of MyoD activity (31). Interestingly, the amount of intron 1 and derived-mature miRNA were also higher in undifferentiated myoblasts suggesting that during myogenic differentiation, splicing of SRA intron 1 parallels the production of a SID. We previously showed that induction of muscle differentiation was not accompanied by a change in the ratio between non-coding and coding SRA isoforms in DM1 cells compared to non-affected muscle cells (31). We showed here that the levels of intron 1 of SRA were not affected during the course of differentiation of DM1 cells. While we highlighted a role of non-coding and coding SRA isoforms in the differentiation of normal cells (31), a role of the SID (intron 1) and more likely of the mature miRNA derived from it, cannot be excluded. Indeed, the impact of the SRA-mature miRNA could be direct (by inhibiting the non coding SRA isoforms retaining intron 1) or indirect (through other targets remaining to be identified). Interestingly, among the top five predicted target genes, and although not much data is available on FBXO41 (F box protein 41) or PRX (Periaxin) in the literature, the three other candidates have a potential link with myoblast differentiation. DAGLA (Diacylglycerol Lipase Alpha) expression is significantly increased upon myogenic differentiation in vitro (93). NECTIN1 (Nectin Cell Adhesion Molecule 1) belongs to the family of cellular adhesion molecules that play central roles in cell adhesion, cell motility, proliferation and survival, and contributes to the morphogenesis and differentiation of many cell types and tissues (94,95). Increased levels of MECP2 (MEthyl CpG binding Protein 2) during myogenic differentiation have been linked to global heterochromatin rearrangements that occur during and are a prerequisite for myogenic terminal differentiation (96). Of course, the whole range of miRNA targets originating from the SID identified in this study still remains to be identified in a given cellular context. As a whole, we present evidence for the existence of new classes of Short Intron-Derived ncRNAs that we termed SID, that regroup agotron, mirtron and simtron RNAs. In addition, we uncovered new types of miRNAs which biogenesis appears to be splicing-dependant but nevertheless closely connected to the classical miRNA pathway. Click here for additional data file.
  95 in total

Review 1.  Nuclear RNA turnover.

Authors:  Melissa J Moore
Journal:  Cell       Date:  2002-02-22       Impact factor: 41.582

2.  Systematic curation and analysis of genomic variations and their potential functional consequences in snoRNA loci.

Authors:  Deeksha Bhartiya; Jatin Talwar; Yasha Hasija; Vinod Scaria
Journal:  Hum Mutat       Date:  2012-07-06       Impact factor: 4.878

Review 3.  When one is better than two: RNA with dual functions.

Authors:  Damien Ulveling; Claire Francastel; Florent Hubé
Journal:  Biochimie       Date:  2010-11-24       Impact factor: 4.079

4.  Intronic microRNA precursors that bypass Drosha processing.

Authors:  J Graham Ruby; Calvin H Jan; David P Bartel
Journal:  Nature       Date:  2007-06-24       Impact factor: 49.962

Review 5.  The crosstalk between plant microRNA biogenesis factors and the spliceosome.

Authors:  Zofia Szweykowska-Kulińska; Artur Jarmolowski; Franck Vazquez
Journal:  Plant Signal Behav       Date:  2013-12-03

Review 6.  Small RNA discovery and characterisation in eukaryotes using high-throughput approaches.

Authors:  Helio Pais; Simon Moxon; Tamas Dalmay; Vincent Moulton
Journal:  Adv Exp Med Biol       Date:  2011       Impact factor: 2.622

7.  A steroid receptor coactivator, SRA, functions as an RNA and is present in an SRC-1 complex.

Authors:  R B Lanz; N J McKenna; S A Onate; U Albrecht; J Wong; S Y Tsai; M J Tsai; B W O'Malley
Journal:  Cell       Date:  1999-04-02       Impact factor: 41.582

8.  Alternative splicing of the first intron of the steroid receptor RNA activator (SRA) participates in the generation of coding and noncoding RNA isoforms in breast cancer cell lines.

Authors:  Florent Hube; Jimin Guo; Shilpa Chooniedass-Kothari; Charlton Cooper; Mohammad K Hamedani; Alexander A Dibrov; Anne A A Blanchard; Xuemei Wang; George Deng; Yvonne Myal; Etienne Leygue
Journal:  DNA Cell Biol       Date:  2006-07       Impact factor: 3.311

9.  Mammalian mirtron genes.

Authors:  Eugene Berezikov; Wei-Jen Chung; Jason Willis; Edwin Cuppen; Eric C Lai
Journal:  Mol Cell       Date:  2007-10-26       Impact factor: 17.970

10.  Interplay between pre-mRNA splicing and microRNA biogenesis within the supraspliceosome.

Authors:  Lily Agranat-Tamir; Noam Shomron; Joseph Sperling; Ruth Sperling
Journal:  Nucleic Acids Res       Date:  2014-01-24       Impact factor: 16.971

View more
  9 in total

Review 1.  Intron specificity in pre-mRNA splicing.

Authors:  Shravan Kumar Mishra; Poonam Thakran
Journal:  Curr Genet       Date:  2018-01-03       Impact factor: 3.886

2.  Multiple information carried by RNAs: total eclipse or a light at the end of the tunnel?

Authors:  Baptiste Bogard; Claire Francastel; Florent Hubé
Journal:  RNA Biol       Date:  2020-06-26       Impact factor: 4.652

3.  An estimate of the total number of true human miRNAs.

Authors:  Julia Alles; Tobias Fehlmann; Ulrike Fischer; Christina Backes; Valentina Galata; Marie Minet; Martin Hart; Masood Abu-Halima; Friedrich A Grässer; Hans-Peter Lenhof; Andreas Keller; Eckart Meese
Journal:  Nucleic Acids Res       Date:  2019-04-23       Impact factor: 16.971

Review 4.  Role of miRNA-19a in Cancer Diagnosis and Poor Prognosis.

Authors:  Alessio Ardizzone; Giovanna Calabrese; Michela Campolo; Alessia Filippone; Dario Giuffrida; Francesca Esposito; Cristina Colarossi; Salvatore Cuzzocrea; Emanuela Esposito; Irene Paterniti
Journal:  Int J Mol Sci       Date:  2021-04-29       Impact factor: 5.923

5.  Human DBR1 modulates the recycling of snRNPs to affect alternative RNA splicing and contributes to the suppression of cancer development.

Authors:  B Han; H K Park; T Ching; J Panneerselvam; H Wang; Y Shen; J Zhang; L Li; R Che; L Garmire; P Fei
Journal:  Oncogene       Date:  2017-05-15       Impact factor: 9.867

6.  CircRNA hsa_circ_0070934 functions as a competitive endogenous RNA to regulate HOXB7 expression by sponging miR‑1236‑3p in cutaneous squamous cell carcinoma.

Authors:  Da-Wei Zhang; Hai-Yan Wu; Chuan-Rong Zhu; Dong-Dong Wu
Journal:  Int J Oncol       Date:  2020-05-14       Impact factor: 5.650

7.  In vivo measurements reveal a single 5'-intron is sufficient to increase protein expression level in Caenorhabditis elegans.

Authors:  Matthew M Crane; Bryan Sands; Christian Battaglia; Brock Johnson; Soo Yun; Matt Kaeberlein; Roger Brent; Alex Mendenhall
Journal:  Sci Rep       Date:  2019-06-24       Impact factor: 4.379

8.  Biological and RNA regulatory function of MOV10 in mammalian germ cells.

Authors:  Kaiqiang Fu; Suwen Tian; Huanhuan Tan; Caifeng Wang; Hanben Wang; Min Wang; Yuanyuan Wang; Zhen Chen; Yanfeng Wang; Qiuling Yue; Qiushi Xu; Shuya Zhang; Haixin Li; Jie Xie; Mingyan Lin; Mengcheng Luo; Feng Chen; Lan Ye; Ke Zheng
Journal:  BMC Biol       Date:  2019-05-14       Impact factor: 7.431

9.  Coding and Non-coding RNAs, the Frontier Has Never Been So Blurred.

Authors:  Florent Hubé; Claire Francastel
Journal:  Front Genet       Date:  2018-04-18       Impact factor: 4.599

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.