| Literature DB >> 23087843 |
Teresa J Filshtein1, Craig O Mackenzie, Maurice D Dale, Paul S Dela-Cruz, Dale M Ernst, Edward A Frankenberger, Chunyan He, Kaylee L Heath, Andria S Jones, Daniel K Jones, Edward R King, Maggie B Maher, Travis J Mitchell, Rachel R Morgan, Sirisha Sirobhushanam, Scott D Halkyard, Kiran B Tiwari, David A Rubin, Glen M Borchert, Erik D Larson.
Abstract
MicroRNAs coordinate networks of mRNAs, but predicting specific sites of interactions is complicated by the very few bases of complementarity needed for regulation. Although efforts to characterize the specific requirements for microRNA (miR) regulation have made some advances, no general model of target recognition has been widely accepted. In this work, we describe an entirely novel approach to miR target identification. The genomic events responsible for the creation of individual miR loci have now been described with many miRs now known to have been initially formed from transposable element (TE) sequences. In light of this, we propose that limiting miR target searches to transcripts containing a miR's progenitor TE can facilitate accurate target identification. In this report we outline the methodology behind OrbId (Origin-based identification of microRNA targets). In stark contrast to the principal miR target algorithms (which rely heavily on target site conservation across species and are therefore most effective at predicting targets for older miRs), we find OrbId is particularly efficacious at predicting the mRNA targets of miRs formed more recently in evolutionary time. After defining the TE origins of > 200 human miRs, OrbId successfully generated likely target sets for 191 predominately primate-specific human miR loci. While only a handful of the loci examined were well enough conserved to have been previously evaluated by existing algorithms, we find ~80% of the targets for the oldest miR (miR-28) in our analysis contained within the principal Diana and TargetScan prediction sets. More importantly, four of the 15 OrbId miR-28 putative targets have been previously verified experimentally. In light of OrbId proving best-suited for predicting targets for more recently formed miRs, we suggest OrbId makes a logical complement to existing, conservation based, miR target algorithms.Entities:
Year: 2012 PMID: 23087843 PMCID: PMC3469430 DOI: 10.4161/mge.21617
Source DB: PubMed Journal: Mob Genet Elements ISSN: 2159-2543

Figure 1. MiR biology and origins. (A) MiR generation. MiRs can occur inter- or intragenically and be transcribed by either RNA Polymerase II or III. Following transcription, the “pre-miR” hairpin (middle) is excised from the initial transcript (or pri-miR) (top) by Drosha. Once in the cytoplasm, the hairpin or stem loop is cleaved and denatured by Dicer to excise the ~20 nt mature miR (bottom). (B) MiR seeds. A seed match between a miR (top) and target mRNA (bottom) is illustrated. The nucleotides in a miR generally referred to as a “seed” (nts 2 through 8) and a “seed match” in a mRNA are depicted in red. Basepairing is indicated by vertical lines. (C) Cartoon depicting the molecular origin of many miR loci. MiRs were initially formed by the neighboring insertions of related TEs. A pri-miR is depicted just above the genome with an arrow indicating readthrough Pol-III transcription from a (+) strand Alu SINE into a neighboring (-) strand Alu. As illustrated, transcriptional readthrough would generate a RNA stem loop whose stems (loaded into the RISC machinery if processed) would correspond to the terminal nucleotides of the neighboring Alus. Figure adapted from.

Figure 2. Establishing a miR regulatory network. MiR regulatory networks are formed when an advantageous regulation arises from a series of random TE insertions into expressed genomic loci, and the formation of a TE juxtaposition by the positive and negative strand insertions of related TEs. Thick lines indicate genomic DNA and thin lines denote RNA. Figure adapted from.
Table 1. OrbId summary. The full Ensembl set of 178,375 unique human mRNA transcripts including 5′UTR, 3′UTR, and ORF annotations were compiled in and retrieved using the Biomart mining utility. “Human miRs analyzed” correspond to the full set of human miR mature sequences identified by Borchert et al. as originating from TEs and were obtained from the miR Registry miRBase.
| Total human miRs analyzed | 208 |
| MiRs with predicted targets | 191 |
| Average # of predicted mRNA targets | 7.9 |
| Median # of mRNA targets | 3 |
| Max # of mRNA targets | 94 |
| Min # of mRNA targets | 1 |
| | |
| Total # of human transcripts assessed | 178,375 |
| Mean transcript length | 1151 nt |
| Total # of 3’UTR targets | 970 |
| Mean 3’UTR length | 386 nt |
| Total # of 5’UTR targets | 410 |
| Mean 5’UTR length | 117 nt |
| Total # of ORF targets | 149 |
| Mean ORF length | 647 |

Figure 3. MiR-28 predicted target three way alignments. Alignments between OrbId predicted miR-28 target mRNAs (middle), a consensus L2B LINE (L2Plat1o) (top), and miR-28 (bottom). (*), base identity in the three aligning sequences. (^), base identity (indicating base pairing) between the miR and mRNA target only. (:), GU basepairing between miR and mRNA target. 3′ UTR or 5′ UTR targeting is indicated. Uracils are shown as thymines and UTRs have been reverse complemented for illustrative purposes.
Table 2. OrbId prediction set for select TE-derived human miRs. “miR Name” refers to miRBase annotation while “Ensembl Gene ID” and “Gene Name” were obtained using the Biomart mining utility. “Diana, TS” refers to whether a predicted target is contained within publically accessible Diana (D) and TargetScan (TS) predictions,. “Region” refers to the location of a predicted target site within a given mRNA. MiR-28–5p corresponds to the participating member of the miR-28 family. MiR-1254–1 is a member of the Alu-miR family. MiR-603 is a member of the miR-548 family
| miR Name | Ensembl Gene ID | Gene Name | DIANA, TS | Region |
|---|---|---|---|---|
| hsa-mir-28–5p | ENSG00000164136 | IL15 | | 5′ UTR |
| | ENSG00000180957 | PITPNB | | 5′ UTR |
| | ENSG00000108309 | RUNDC3A | | 3′ UTR |
| | ENSG00000106608 | URGCP | D,TS | 3′ UTR |
| | ENSG00000122741 | DCAF10 | D,TS | 3′ UTR |
| | ENSG00000144043 | TEX261 | D,TS | 3′ UTR |
| | ENSG00000152578 | GRIA4 | D,TS | 3′ UTR |
| | ENSG00000134046 | MBD2 | | 3′ UTR |
| | ENSG00000117598 | LPPR5.1 | D,TS | 3′ UTR |
| | ENSG00000124466 | LYPD3 | D,TS | 3′ UTR |
| | ENSG00000102921 | N4BP1 | D,TS | 3′ UTR |
| | ENSG00000169016 | E2F6 | D,TS | 3′ UTR |
| | ENSG00000135999 | EPC2 | D | 3′ UTR |
| | ENSG00000123472 | ATPAF1 | D,TS | 3′ UTR |
| | ENSG00000116641 | DOCK7 | | 3′ UTR |
| hsa-mir-301a5p | ENSG00000105856 | HBP1 | D | 5′ UTR |
| | ENSG00000175445 | LPL | | ORF |
| | ENSG00000082175 | PGR | | ORF |
| | ENSG00000166004 | KIAA1731 | | ORF |
| | ENSG00000136573 | BLK | | ORF |
| hsa-mir-544a | ENSG00000144560 | VGLL4 | D,TS | 5′ UTR |
| | ENSG00000140632 | GLYR1 | | ORF |
| | ENSG00000078018 | MAP2 | D,TS | ORF |
| | ENSG00000197279 | ZNF165 | | ORF |
| | ENSG00000130066 | SAT1 | | ORF |
| | ENSG00000183035 | CYLC1 | | ORF |
| | ENSG00000142178 | SIK1 | | ORF |
| | ENSG00000173681 | CXorf23 | | 3′ UTR |
| hsa-mir-603 | ENSG00000122692 | SMU1 | | 3′ UTR |
| | ENSG00000102781 | KATNAL1 | D,TS | 3′ UTR |
| | ENSG00000116205 | TCEANC2 | | 3′ UTR |
| | ENSG00000004468 | CD38 | | 3′ UTR |
| | ENSG00000226264 | HLA-DMB | | 3′ UTR |
| | ENSG00000183908 | LRRC55 | | 3′ UTR |
| | ENSG00000184040 | FAM23B.1 | | 3′ UTR |
| | ENSG00000148483 | TMEM236 | | 3′ UTR |
| | ENSG00000132623 | ANKRD5 | | 3′ UTR |
| | ENSG00000144455 | SUMF1 | | 3′ UTR |
| | ENSG00000215020 | AL591684.1 | | 3′ UTR |
| | ENSG00000215033 | AL603965.1 | | 3′ UTR |
| hsa-mir-1254–1 | ENSG00000081760 | AACS | | 5′ UTR |
| | ENSG00000167077 | MEI1 | | 5′ UTR |
| | ENSG00000238035 | AC138035.1 | | 5′ UTR |
| ENSG00000160991 | ORAI2 | 3′ UTR |

Figure 4. MiR-28, miR-151 and miR-708 target network. Only shared targets are depicted including 14 of 15 miR-28–5p targets, 11 of 14 miR-151a-5p targets, and 4 of 13 miR-708 targets. Green lines indicate miR regulation.

Figure 5. OrbId methodology flowchart. A high level overview of the steps taken to determine miR and transposable element concurrent alignments within the human transcriptome.