Literature DB >> 23087843

OrbId: Origin-based identification of microRNA targets.

Teresa J Filshtein1, Craig O Mackenzie, Maurice D Dale, Paul S Dela-Cruz, Dale M Ernst, Edward A Frankenberger, Chunyan He, Kaylee L Heath, Andria S Jones, Daniel K Jones, Edward R King, Maggie B Maher, Travis J Mitchell, Rachel R Morgan, Sirisha Sirobhushanam, Scott D Halkyard, Kiran B Tiwari, David A Rubin, Glen M Borchert, Erik D Larson.   

Abstract

MicroRNAs coordinate networks of mRNAs, but predicting specific sites of interactions is complicated by the very few bases of complementarity needed for regulation. Although efforts to characterize the specific requirements for microRNA (miR) regulation have made some advances, no general model of target recognition has been widely accepted. In this work, we describe an entirely novel approach to miR target identification. The genomic events responsible for the creation of individual miR loci have now been described with many miRs now known to have been initially formed from transposable element (TE) sequences. In light of this, we propose that limiting miR target searches to transcripts containing a miR's progenitor TE can facilitate accurate target identification. In this report we outline the methodology behind OrbId (Origin-based identification of microRNA targets). In stark contrast to the principal miR target algorithms (which rely heavily on target site conservation across species and are therefore most effective at predicting targets for older miRs), we find OrbId is particularly efficacious at predicting the mRNA targets of miRs formed more recently in evolutionary time. After defining the TE origins of > 200 human miRs, OrbId successfully generated likely target sets for 191 predominately primate-specific human miR loci. While only a handful of the loci examined were well enough conserved to have been previously evaluated by existing algorithms, we find ~80% of the targets for the oldest miR (miR-28) in our analysis contained within the principal Diana and TargetScan prediction sets. More importantly, four of the 15 OrbId miR-28 putative targets have been previously verified experimentally. In light of OrbId proving best-suited for predicting targets for more recently formed miRs, we suggest OrbId makes a logical complement to existing, conservation based, miR target algorithms.

Entities:  

Year:  2012        PMID: 23087843      PMCID: PMC3469430          DOI: 10.4161/mge.21617

Source DB:  PubMed          Journal:  Mob Genet Elements        ISSN: 2159-2543


Introduction

During the latter half of the 20th century one of the greatest achievements in genetic research was the meticulous cataloging of epistatic relationships between genetic loci. While new relationships brought new insights, they also created massive networks of seemingly endlessly interacting genetic pathways. In 1993, however, Lee et al. described an entirely new short noncoding RNA that, despite its size, would ultimately be recognized as an important player in deciphering complex genetic interactions. These small microRNAs (miRs) are only ~20 nts in length (Fig. 1A) and are capable of coordinating the expressions of networks of mRNAs (mRNAs) through complementary basepairing. Strikingly, over 1,900 unique human miRs have been cloned since the first were discovered in 2001.- As such, it is of little surprise that miR research has seen a recent explosion of interest, especially considering that a single miR has the potential to control expression of dozens of genes and miR misregulations are commonly associated with oncogenesis (recently reviewed in ref. 6).

Figure 1. MiR biology and origins. (A) MiR generation. MiRs can occur inter- or intragenically and be transcribed by either RNA Polymerase II or III. Following transcription, the “pre-miR” hairpin (middle) is excised from the initial transcript (or pri-miR) (top) by Drosha. Once in the cytoplasm, the hairpin or stem loop is cleaved and denatured by Dicer to excise the ~20 nt mature miR (bottom). (B) MiR seeds. A seed match between a miR (top) and target mRNA (bottom) is illustrated. The nucleotides in a miR generally referred to as a “seed” (nts 2 through 8) and a “seed match” in a mRNA are depicted in red. Basepairing is indicated by vertical lines. (C) Cartoon depicting the molecular origin of many miR loci. MiRs were initially formed by the neighboring insertions of related TEs. A pri-miR is depicted just above the genome with an arrow indicating readthrough Pol-III transcription from a (+) strand Alu SINE into a neighboring (-) strand Alu. As illustrated, transcriptional readthrough would generate a RNA stem loop whose stems (loaded into the RISC machinery if processed) would correspond to the terminal nucleotides of the neighboring Alus. Figure adapted from.

Figure 1. MiR biology and origins. (A) MiR generation. MiRs can occur inter- or intragenically and be transcribed by either RNA Polymerase II or III. Following transcription, the “pre-miR” hairpin (middle) is excised from the initial transcript (or pri-miR) (top) by Drosha. Once in the cytoplasm, the hairpin or stem loop is cleaved and denatured by Dicer to excise the ~20 nt mature miR (bottom). (B) MiR seeds. A seed match between a miR (top) and target mRNA (bottom) is illustrated. The nucleotides in a miR generally referred to as a “seed” (nts 2 through 8) and a “seed match” in a mRNA are depicted in red. Basepairing is indicated by vertical lines. (C) Cartoon depicting the molecular origin of many miR loci. MiRs were initially formed by the neighboring insertions of related TEs. A pri-miR is depicted just above the genome with an arrow indicating readthrough Pol-III transcription from a (+) strand Alu SINE into a neighboring (-) strand Alu. As illustrated, transcriptional readthrough would generate a RNA stem loop whose stems (loaded into the RISC machinery if processed) would correspond to the terminal nucleotides of the neighboring Alus. Figure adapted from. Whereas novel miR discovery has been forthcoming, progress in deciphering miR regulations has proven exceptionally challenging. This is largely due to miRs requiring very little sequence complementarity to the mRNAs they coordinate. In contrast to siRNAs which depend upon almost perfect complementarity to direct message degradation, miR target recognition and consequent repression can be mediated through as few as 7 bps of complementarity. Generally thought to most frequently occur in the 5′ miR sequence, these 7 participating nts are typically referred to as the miR “seed” and the complement in a mRNA as the “seed match”- (Fig. 1B). The recurrent observation of complementarity between seed and seed match in a few initially characterized miR-target interactions lead to the majority of miR target recognition algorithms basing target searches on perfect seed matches. Following this, most algorithms differ primarily by the significance they attribute to seed match conservation between species, the presence of multiple seed matches in a given mRNA target, and the extent of complementarity between the proposed target and remainder of the miR (recently reviewed refs. 10-14). While algorithms have been developed that do not require target site conservation across species (focusing instead on thermodynamic stability and target site secondary structure (e.g., PITA and rna22), the principal, most widely accepted target prediction algorithms (DIANA-microT, miRanda, PicTar,, and TargetScan) each incorporate target site conservation into their prediction methodologies. Although efforts to characterize the specific requirements for miR target recognition continue to advance, to date the principal target algorithms typically suggest several hundred putative mRNA targets for each individual miR. As such is the case, no model of miR target prediction has been widely accepted. Similar in rationale to the principal miR target prediction algorithms (although not requiring target site conservation across species), we have developed an entirely novel approach to miR target identification. First suggested by Smalheiser and Torvik, the molecular events responsible for the genomic formation of many miR loci from transposable element (TE) sequences have now been described-(Fig. 1C). Having recently performed a series of detailed genomic analyses describing the TE origins of ~2,400 distinct miRs, we hypothesized that a miR and its mRNA target sites might actually be formed in parallel by the ongoing colonization of a common ancestral transposable element (Fig. 2). In light of this, we propose that limiting miR target searches to mRNAs containing the TE initially giving rise to a miR can significantly hone accurate target identification. In this work we outline the methodology behind, and initial findings for, a novel miR target prediction strategy: OrbId (Origin-based Identification of microRNA targets). In all, we have successfully generated target sets for 191 unique miRs after applying OrbId to a set of 208 distinct human miRs of defined TE origin. While the majority of OrbId putative targets were for recently formed miR loci, we did generate targets for the evolutionarily older miR-28 family and find our results largely in agreement with both traditional target prediction strategies, and existing experimental evidence. Thus, the mRNA targets of a given miR can largely be predicted based on shared transposable element origins.

Figure 2. Establishing a miR regulatory network. MiR regulatory networks are formed when an advantageous regulation arises from a series of random TE insertions into expressed genomic loci, and the formation of a TE juxtaposition by the positive and negative strand insertions of related TEs. Thick lines indicate genomic DNA and thin lines denote RNA. Figure adapted from.

Figure 2. Establishing a miR regulatory network. MiR regulatory networks are formed when an advantageous regulation arises from a series of random TE insertions into expressed genomic loci, and the formation of a TE juxtaposition by the positive and negative strand insertions of related TEs. Thick lines indicate genomic DNA and thin lines denote RNA. Figure adapted from.

Results

Targets predicted for 92% of human miRs with defined TE origins

OrbId operates under the premise that a miR and its mRNA target sites were formed in parallel by the colonization of a common progenitor transposable element (Fig. 2). Utilizing this premise, we have successfully generated putative target sets for 191 of 208 human miRs with defined TE origins. In stark contrast to the principal miR target algorithms currently utilized (which typically predict several hundred putative mRNA targets for individual miRs-,), we find OrbId predicts significantly fewer mRNA targets per miR (average 7.9, median 3) (Table 1). In all, 59 produced a single mRNA target, 120 distinct miRs were suggested to have between 2 and 25 target mRNAs, and 12 were predicted to target > 25 mRNAs (max = 94, putative targets for miR-574) (Table S1). In order to ensure strict adherence to the OrbId operating methodology, sequence alignments of unique mRNA target sites, miRs, and progenitor TEs were independently verified (Fig. 3, Table S2).

Table 1. OrbId summary. The full Ensembl set of 178,375 unique human mRNA transcripts including 5′UTR, 3′UTR, and ORF annotations were compiled in and retrieved using the Biomart mining utility. “Human miRs analyzed” correspond to the full set of human miR mature sequences identified by Borchert et al. as originating from TEs and were obtained from the miR Registry miRBase.

Total human miRs analyzed
208
MiRs with predicted targets
191
Average # of predicted mRNA targets
7.9
Median # of mRNA targets
3
Max # of mRNA targets
94
Min # of mRNA targets
1
 
 
Total # of human transcripts assessed
178,375
Mean transcript length
1151 nt
Total # of 3’UTR targets
970
Mean 3’UTR length
386 nt
Total # of 5’UTR targets
410
Mean 5’UTR length
117 nt
Total # of ORF targets
149
Mean ORF length647

Figure 3. MiR-28 predicted target three way alignments. Alignments between OrbId predicted miR-28 target mRNAs (middle), a consensus L2B LINE (L2Plat1o) (top), and miR-28 (bottom). (*), base identity in the three aligning sequences. (^), base identity (indicating base pairing) between the miR and mRNA target only. (:), GU basepairing between miR and mRNA target. 3′ UTR or 5′ UTR targeting is indicated. Uracils are shown as thymines and UTRs have been reverse complemented for illustrative purposes.

Figure 3. MiR-28 predicted target three way alignments. Alignments between OrbId predicted miR-28 target mRNAs (middle), a consensus L2B LINE (L2Plat1o) (top), and miR-28 (bottom). (*), base identity in the three aligning sequences. (^), base identity (indicating base pairing) between the miR and mRNA target only. (:), GU basepairing between miR and mRNA target. 3′ UTR or 5′ UTR targeting is indicated. Uracils are shown as thymines and UTRs have been reverse complemented for illustrative purposes.

Target sites are generally not preferentially located in 3′ UTRs

MiRs have now been conclusively shown to regulate target mRNAs through interactions with 5′ untranslated regions (UTR) and open reading frame (ORF) sequences similar to 3′ UTR interactions.- As such, it is somewhat surprising that most target prediction algorithms-, predominately screen mRNA 3′ UTRs for miR regulatory sites. In this analysis, we assessed all publically available human mRNA sequence regardless of functional annotation., Strikingly, not only did we find strong evidence supporting 5′ UTR and ORF regulations, we did not observe a general bias for 3′ UTR target sites (Table 1). In all, of 1529 unique predicted regulations, 970, 410 and 149 are located within 3′ UTR, 5′ UTR and ORF sequences respectively. When the average lengths of human 3′ UTR (386 nt), 5′ UTR (117 nt) and ORF (647 nt) sequences are taken in consideration, we find no significant bias for targeting to occur in either UTR preferentially. However, we did find target sites were approximately 12 times as likely to occur in noncoding UTR sequences than in ORF coding regions. Importantly, while we observed no general bias for miR targeting of 3′ UTR, 5′ UTR or ORF sequences, individual miR families showed significant targeting preferences. Of note, the targets of the principal mariner transposon derived miR family (miR-548) were located almost exclusively (> 99%) within 3′ UTR sequences, the targets of the principal LINE derived miR family (miR-28) were similarly biased to occur within 3′ UTR sequences (> 96%), but, in sharp contrast, the targets of a novel Alu SINE derived miR family (Alu-miR) were located predominately (> 81%) within 5′ UTR sequences (Table 2, Figure S1). Additionally, while less than 10% of putative targets were predicted to occur in ORFs (despite ORFs accounting for > 56% of the total transcript sequence examined(Table 1), we identify two miRs, miR-544 and miR-301a-5p which are predicted to preferentially (> 90%) target ORF sequences (Table 2, Figure S1).

Table 2. OrbId prediction set for select TE-derived human miRs. “miR Name” refers to miRBase annotation while “Ensembl Gene ID” and “Gene Name” were obtained using the Biomart mining utility. “Diana, TS” refers to whether a predicted target is contained within publically accessible Diana (D) and TargetScan (TS) predictions,. “Region” refers to the location of a predicted target site within a given mRNA. MiR-28–5p corresponds to the participating member of the miR-28 family. MiR-1254–1 is a member of the Alu-miR family. MiR-603 is a member of the miR-548 family

miR NameEnsembl Gene IDGene NameDIANA, TSRegion
hsa-mir-28–5p
ENSG00000164136
IL15
 
5′ UTR
 
ENSG00000180957
PITPNB
 
5′ UTR
 
ENSG00000108309
RUNDC3A
 
3′ UTR
 
ENSG00000106608
URGCP
D,TS
3′ UTR
 
ENSG00000122741
DCAF10
D,TS
3′ UTR
 
ENSG00000144043
TEX261
D,TS
3′ UTR
 
ENSG00000152578
GRIA4
D,TS
3′ UTR
 
ENSG00000134046
MBD2
 
3′ UTR
 
ENSG00000117598
LPPR5.1
D,TS
3′ UTR
 
ENSG00000124466
LYPD3
D,TS
3′ UTR
 
ENSG00000102921
N4BP1
D,TS
3′ UTR
 
ENSG00000169016
E2F6
D,TS
3′ UTR
 
ENSG00000135999
EPC2
D
3′ UTR
 
ENSG00000123472
ATPAF1
D,TS
3′ UTR
 
ENSG00000116641
DOCK7
 
3′ UTR
hsa-mir-301a5p
ENSG00000105856
HBP1
D
5′ UTR
 
ENSG00000175445
LPL
 
ORF
 
ENSG00000082175
PGR
 
ORF
 
ENSG00000166004
KIAA1731
 
ORF
 
ENSG00000136573
BLK
 
ORF
hsa-mir-544a
ENSG00000144560
VGLL4
D,TS
5′ UTR
 
ENSG00000140632
GLYR1
 
ORF
 
ENSG00000078018
MAP2
D,TS
ORF
 
ENSG00000197279
ZNF165
 
ORF
 
ENSG00000130066
SAT1
 
ORF
 
ENSG00000183035
CYLC1
 
ORF
 
ENSG00000142178
SIK1
 
ORF
 
ENSG00000173681
CXorf23
 
3′ UTR
hsa-mir-603
ENSG00000122692
SMU1
 
3′ UTR
 
ENSG00000102781
KATNAL1
D,TS
3′ UTR
 
ENSG00000116205
TCEANC2
 
3′ UTR
 
ENSG00000004468
CD38
 
3′ UTR
 
ENSG00000226264
HLA-DMB
 
3′ UTR
 
ENSG00000183908
LRRC55
 
3′ UTR
 
ENSG00000184040
FAM23B.1
 
3′ UTR
 
ENSG00000148483
TMEM236
 
3′ UTR
 
ENSG00000132623
ANKRD5
 
3′ UTR
 
ENSG00000144455
SUMF1
 
3′ UTR
 
ENSG00000215020
AL591684.1
 
3′ UTR
 
ENSG00000215033
AL603965.1
 
3′ UTR
hsa-mir-1254–1
ENSG00000081760
AACS
 
5′ UTR
 
ENSG00000167077
MEI1
 
5′ UTR
 
ENSG00000238035
AC138035.1
 
5′ UTR
 ENSG00000160991ORAI2 3′ UTR

LINE L2B (miR-28) family.

First identified in 2003 as arising from L2B LINE elements, miR-28 and miR-151 have long been recognized as being related, and their numerous representative sequences across mammalia are collectively referred to as the miR-28 family. Supporting this relationship, and despite there only being an ~10% likelihood that a given miR in this analysis would target the same mRNA as any other miR, we find ~76% of miR-28 and miR-151 proposed targets (11 of 15 and 11 of 14 respectively) common to both miRs (Fig. 4, Table S1). Our analyses also indicate the likelihood of a third, until now overlooked, member of the miR-28 family, miR-708. While initially formed from the same LINE element that gave rise to miR-28 and miR-151 and baring significant pre-miR homology to both miR-28 and miR-151 (Figure S2), we find ~31% of miR-708 targets also constitute miR-28 family targets (Fig. 4, Table S1). Additionally, as the miR-28 family was the oldest in our analysis, miR-28 was one of the few miRs with publically available Diana and TargetScan predictions. Encouragingly we find ~80% of our miR-28 target predictions contained within the principal Diana and TargetScan prediction sets, (Table 2). Furthermore, over 25% of our putative miR-28 targets (4 of 15) have already been experimentally verified and shown to indeed regulate the mRNAs predicted by OrbId (ref. 29 data not shown).

Figure 4. MiR-28, miR-151 and miR-708 target network. Only shared targets are depicted including 14 of 15 miR-28–5p targets, 11 of 14 miR-151a-5p targets, and 4 of 13 miR-708 targets. Green lines indicate miR regulation.

Figure 4. MiR-28, miR-151 and miR-708 target network. Only shared targets are depicted including 14 of 15 miR-28–5p targets, 11 of 14 miR-151a-5p targets, and 4 of 13 miR-708 targets. Green lines indicate miR regulation.

MiRs formed from Alu repeats

In contrast to the miR-548 and miR-28 families, the targets of a novel Alu SINE derived miR family (miR-566) were located predominately (> 80%) within 5′ UTR sequences. While not as closely related as the miR-28 family, these Alu-derived miRs share several target relationships. While they may not constitute a traditional miR family based on common molecular origin, they could be considered to be a family in the sense of common targeting. In all, miRs -566, -1254, -1268, -1273, -1285, -1968, -1972, and -1973 appear to establish a significant network of target regulations (Figure S3). Intriguingly, our findings are largely in agreement with previous reports suggesting that 3′UTR embedded Alu repeats frequently house novel, primate-specific miR target sites.-

Discussion

The genomic events responsible for the initial formation of numerous miR loci have recently been described. The majority of these loci appear to have initially arisen from transposable element (TE) sequences. In addition to forming miR loci, we now hypothesize that TE mobilizations also generate miR regulatory networks by simultaneously integrating into existing mRNA expression cassettes (Fig. 2). Thus, the principle objective of this work was to utilize common TE ancestry to facilitate accurate prediction of miR-mRNA target interactions. To accomplish this, we have developed a novel methodology titled OrbId (Origin-based Identification of microRNA targets) (Fig. 5). OrbId contrasts sharply with current miR target algorithms-, as these methodologies rely heavily on target site conservation across species and have therefore been primarily effective at predicting targets for well conserved miRs. OrbId is better suited for predicting the mRNA targets of evolutionarily younger miRs for which target site conservation searches are impractical. For example, the 70 human miR loci known to have been formed from primate-specific Alu repeats,, rodent-specific miRs formed from rodent specific B1 SINEs, or the marsupial-specific miRs formed from marsupial-specific transposable elements.

Figure 5. OrbId methodology flowchart. A high level overview of the steps taken to determine miR and transposable element concurrent alignments within the human transcriptome.

Figure 5. OrbId methodology flowchart. A high level overview of the steps taken to determine miR and transposable element concurrent alignments within the human transcriptome. OrbId may also prove valuable in identifying taxon-specific targets of more conserved miRs. Requiring target site conservation across species has been effective at predicting many of the targets for conserved miRs. By design, however, traditional conservation-based miR target algorithms miss any targets arising from TE mobilizations following the initial establishment of a miR regulatory network. For example, if ongoing TE colonizations occur following speciation events, separate species might well acquire distinct, novel targets for existing miRs. Although beyond the scope of this analysis, more comprehensive species wide implementations of OrbId will be needed to fully evaluate the prevalence of such events. Future analyses will unquestionably broaden the range of OrbId utility as the existing repertoire of defined miR-TE relationships continues to expand through the ongoing characterizations of additional miR loci and novel TE sequences. Importantly, de Koning et al. recently suggested that over two-thirds of the human genome were actually formed from repetitive elements. While highly intriguing, the extent of the repetitive composition of the human genome remains a significant point of debate and attempts to fully clarify this issue remain ongoing. Should the work of de Koning et al. prove largely accurate, the incorporation of this information into current OrBId methodology would clearly result in marked increases in the definable number of putative miR::target relationships. Additionally, while this would likely predominately facilitate putative target identification for evolutionarily older miRs, it would also almost certainly require increased stringency to avoid concurrent increases in false positives. As a result of electing to limit our OrBId analysis to identifying the targets of miRs whose TE origins have been clearly defined using RepBase annotations,, this analysis was confined to the evaluation of ~16% of currently annotated human miR loci (resulting in target predictions for 191 unique human miRs). While our OrbId analysis primarily dealt with miRs predominately unexamined by the principal miR target prediction algorithms, in striking contrast to the hundreds of putative mRNAs generally predicted by the principal algorithms,-, OrbId averaged ~8 putative mRNA targets per unique miR. While the average number of mRNAs a typical miR regulates remains poorly defined, we suggest our predictions most likely only constitute a subset of actual miR regulations (largely due to the high degree of complementarity we required for putative target interaction). However, since OrbId target sets are derived through a rationale based on molecular origin, we suggest that the OrbId putative target lists reported here likely contain a markedly higher proportion of actual endogenous miR targets than the hundreds of predicted mRNA targets obtained through less stringent algorithms. Additionally and in terms of laboratory and clinical efforts, we suggest that a manageable number of likely endogenous relationships based on a molecular rationale is in many ways advantageous to more encompassing sets of hundreds of putative targets. Importantly, > 95% of the miRs included in our analysis have not been examined by the principal target prediction algorithms (most likely due to either their repetitive nature or their being primate specific and not conserved across species). We do find, however, that the OrBId target predictions for the few miRs in our analysis that have previously been examined are largely in agreement with more established algorithms. For example, we find ~80% of our putative miR-28 targets are contained within the principal Diana and TargetScan predictions (Table 2). Excitingly, four of our 13 putative miR-28 3′ UTR targets have actually previously been verified experimentally. Additionally, three of these experimentally verified miR-28 targets, N4BP1, E2F6, and TEX261 are expressed alongside miR-28 in blood cell lineages and have each been speculated to contribute to myeloproliferative neoplasms. While experimental corroborations such as these are encouraging, the majority of our novel OrBId miR target predictions will clearly ultimately require direct experimental validation. It is tempting to speculate, however, that experimental verification of many of our miR interactions might well be forthcoming as this work represents the first time putative target sets have been reported for the majority of the 191 distinct miRs examined in this analysis thereby constituting the first real examination of potential target interactions for ~10% of all currently characterized human miRs. In conclusion, we report here a new approach for miR target prediction that relies on TE origins. In all probability a universal description of miR target interaction has not yet been characterized because there is no universal description of miR target interaction. Complicating factors such as GU base-pairing, nucleotide editing, target secondary structure and RNA-interacting protein effects make strict thermodynamic modeling largely incapable of honing in on actual mRNA targets. Likely a closer estimation of true mRNA regulations, OrbId predicts far fewer mRNA targets per miR than existing algorithms through employing a molecular, origin-based rationale. Importantly, incorporating logical molecular cues such as target site conservation has previously been successfully exploited to circumvent the limitations of mathematical modeling alone.-, Similarly based on genetic rationale, this work introduces a novel consideration that helps to circumvent many of the difficulties in accurate target identification. We suggest that since TEs are present in multiple copies across the genome, and miRs target sequences through complementary basepairing, requiring a miR target site to occur in the same TE from which a miR was initially formed represents a logical addition to miR target prediction. In contrast to the principal miR target algorithms currently utilized-, (which rely heavily on target site conservation across species and have therefore been primarily effective at predicting targets for well conserved miRs), OrbId has been designed to predict the mRNA targets of evolutionarily younger miRs and therefore makes a strategically logical complement to existing miR target algorithms.

Materials and Methods

Retrieving miR, transposable element mRNA and genomic sequences

In 2011, Borchert et al. established a connection between miRs and transposable elements (TE) providing evidence for the role of repetitive elements in miR origin. Unique TEs associated with the origins of > 200 human miRs were retrieved from the data set created from the work of Borchert et al. and used as the basis for this analysis. Single FASTA files containing the full set of human miR mature sequences were downloaded from the miR Registry housed at Sanger (http://www.mirbase.org). Flanked genomic sequences were obtained for human miRs corresponding to genomes currently available in Ensembl (+/− 250 base pair flanks). Unique miR accession numbers from the miR Registry were attached to the corresponding flanked genomic sequence then utilized as the origin-based TE sequence. Next, the full set of Ensembl human 5′UTR, 3′UTR, and ORF sequences were compiled in and retrieved using the Biomart mining utility. Of 178,375 unique human transcripts, 68,892,718 nts corresponded to 3′ UTR sequence, 20,940,347 nts corresponded to 5′ UTR sequence and 115,422,049 nts corresponded to ORF sequence making the average 3′ UTR, 5′ UTR and ORF lengths examined in this study 386, 117 and 647 nts respectively.

Correlating miR target sites with progenitor TEs

It is important to note that all alignment analyses were identically run in parallel by three independent research teams and cross examined for verification. Significant alignments between the miR and TE sequences with the human 5′UTR, 3′UTR, and ORF sequences were obtained via BLAST (BLASTN 2.2.15 with -FF, -W7 flags). Beyond requiring a common molecular origin for each member of a putative miR::mRNA interaction, the majority of false positive relationships were largely avoided through requiring long, nearly perfect complementarities. Strongly agreeing with similarly stringent statistical searches for miR targets, this strategy resulted in the identification of numerous long runs of perfect complementarity between putative miRs and targets and found no significant bias for that complementarity to occur near miR 5′ ends or in mRNA 3′UTRs. For the miR sequences, significant alignments were strictly defined as ≥ 88% identity for ≥ 17 bp hits or 100% identity for 12–16 bps. For TE sequences, significant alignments were strictly defined as ≥ 70% identity for 50+ bp hits or ≥ 80% for bp hits less than 50. Using the proceeding search algorithm we determined alignment matches along the human 5′UTR, 3′UTR, and ORF sequences between each miR and its corresponding TE. Our algorithm looked at each miR::mRNA alignment and searched for overlapping TE alignments in the same region of that transcript. If such TE alignments were found, the transcript was recorded as a target for that miR. We defined a miR as hitting the same region as its corresponding TE if either of two following criterion were satisfied: (1) The miR ending alignment position was between the TE beginning and ending alignment positions (inclusive), or (2) The miR beginning alignment position was between the TE beginning and ending alignment positions (inclusive). If at least part of the miR alignment is within the TE alignment region on a gene, then this method counted the transcript as a miR target (Figure S5). Additionally, as control, we randomly generated 10 scrambled sets of matched, size appropriate miR repeat pairs to search for targets using OrBId. Importantly, we identified no putative targets for scrambled controls in the human transcriptome.
  47 in total

1.  Comprehensive analysis of microRNA genomic loci identifies pervasive repetitive-element origins.

Authors:  Glen M Borchert; Nathaniel W Holton; Jonathan D Williams; William L Hernan; Ian P Bishop; Joel A Dembosky; James E Elste; Nathaniel S Gregoire; Jee-Ah Kim; Wesley W Koehler; Joe C Lengerich; Arianna A Medema; Marilyn A Nguyen; Geoffrey D Ower; Michelle A Rarick; Brooke N Strong; Nicholas J Tardi; Nathan M Tasker; Darren J Wozniak; Craig Gatto; Erik D Larson
Journal:  Mob Genet Elements       Date:  2011-05

2.  Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets.

Authors:  Benjamin P Lewis; Christopher B Burge; David P Bartel
Journal:  Cell       Date:  2005-01-14       Impact factor: 41.582

3.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis.

Authors:  Steffen Durinck; Yves Moreau; Arek Kasprzyk; Sean Davis; Bart De Moor; Alvis Brazma; Wolfgang Huber
Journal:  Bioinformatics       Date:  2005-08-15       Impact factor: 6.937

4.  Target mRNAs are repressed as efficiently by microRNA-binding sites in the 5' UTR as in the 3' UTR.

Authors:  J Robin Lytle; Therese A Yario; Joan A Steitz
Journal:  Proc Natl Acad Sci U S A       Date:  2007-05-29       Impact factor: 11.205

5.  New class of microRNA targets containing simultaneous 5'-UTR and 3'-UTR interaction sites.

Authors:  Inhan Lee; Subramanian S Ajay; Jong In Yook; Hyun Sil Kim; Su Hyung Hong; Nam Hee Kim; Saravana M Dhanasekaran; Arul M Chinnaiyan; Brian D Athey
Journal:  Genome Res       Date:  2009-03-31       Impact factor: 9.043

Review 6.  Got target? Computational methods for microRNA target prediction and their extension.

Authors:  Hyeyoung Min; Sungroh Yoon
Journal:  Exp Mol Med       Date:  2010-04-30       Impact factor: 8.718

7.  Mammalian microRNAs derived from genomic repeats.

Authors:  Neil R Smalheiser; Vetle I Torvik
Journal:  Trends Genet       Date:  2005-06       Impact factor: 11.639

8.  MicroRNA-10a binds the 5'UTR of ribosomal protein mRNAs and enhances their translation.

Authors:  Ulf Andersson Ørom; Finn Cilius Nielsen; Anders H Lund
Journal:  Mol Cell       Date:  2008-05-23       Impact factor: 17.970

9.  Rfam: annotating non-coding RNAs in complete genomes.

Authors:  Sam Griffiths-Jones; Simon Moxon; Mhairi Marshall; Ajay Khanna; Sean R Eddy; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

10.  Evidence for co-evolution between human microRNAs and Alu-repeats.

Authors:  Stefan Lehnert; Peter Van Loo; Pushpike J Thilakarathne; Peter Marynen; Geert Verbeke; Frans C Schuit
Journal:  PLoS One       Date:  2009-02-11       Impact factor: 3.240

View more
  6 in total

1.  Continuing analysis of microRNA origins: Formation from transposable element insertions and noncoding RNA mutations.

Authors:  Justin T Roberts; Elvera A Cooper; Connor J Favreau; Jacob S Howell; Lee G Lane; James E Mills; Derrick C Newman; Tabitha J Perry; Meaghan E Russell; Brittany M Wallace; Glen M Borchert
Journal:  Mob Genet Elements       Date:  2014-01-10

2.  Functional microRNAs and target sites are created by lineage-specific transposition.

Authors:  Ryan M Spengler; Clayton K Oakley; Beverly L Davidson
Journal:  Hum Mol Genet       Date:  2013-11-13       Impact factor: 6.150

3.  LINE-1 retrotransposons and let-7 miRNA: partners in the pathogenesis of cancer?

Authors:  Stephen Ohms; Sung-Hun Lee; Danny Rangasamy
Journal:  Front Genet       Date:  2014-10-07       Impact factor: 4.599

4.  Human snoRNA-93 is processed into a microRNA-like RNA that promotes breast cancer cell invasion.

Authors:  Dillon G Patterson; Justin T Roberts; Valeria M King; Dominika Houserova; Emmaline C Barnhill; Aline Crucello; Caroline J Polska; Lucas W Brantley; Garrett C Kaufman; Michael Nguyen; Megann W Santana; Ian A Schiller; Julius S Spicciani; Anastasia K Zapata; Molly M Miller; Timothy D Sherman; Ruixia Ma; Hongyou Zhao; Ritu Arora; Alexander B Coley; Melody M Zeidan; Ming Tan; Yaguang Xi; Glen M Borchert
Journal:  NPJ Breast Cancer       Date:  2017-07-10

5.  LINE-2 transposable elements are a source of functional human microRNAs and target sites.

Authors:  Rebecca Petri; Per Ludvik Brattås; Yogita Sharma; Marie E Jönsson; Karolina Pircs; Johan Bengzon; Johan Jakobsson
Journal:  PLoS Genet       Date:  2019-03-13       Impact factor: 5.917

Review 6.  Burgeoning evidence indicates that microRNAs were initially formed from transposable element sequences.

Authors:  Justin T Roberts; Sara E Cardin; Glen M Borchert
Journal:  Mob Genet Elements       Date:  2014-05-22
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.