| Literature DB >> 20520764 |
Casey R Richardson1, Qing-Jun Luo, Viktoria Gontcharova, Ying-Wen Jiang, Manoj Samanta, Eunseog Youn, Christopher D Rock.
Abstract
BACKGROUND: MicroRNAs (miRNAs) and trans-acting small-interfering RNAs (tasi-RNAs) are small (20-22 nt long) RNAs (smRNAs) generated from hairpin secondary structures or antisense transcripts, respectively, that regulate gene expression by Watson-Crick pairing to a target mRNA and altering expression by mechanisms related to RNA interference. The high sequence homology of plant miRNAs to their targets has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms because miRNA complementarity is less conserved yet transitive processes (production of antisense smRNAs) are active in eukaryotes. We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery. PRINCIPALEntities:
Mesh:
Substances:
Year: 2010 PMID: 20520764 PMCID: PMC2877095 DOI: 10.1371/journal.pone.0010710
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Abundance and topology of unique MPSS smRNA signatures with perfect matches to MIRNA hairpins.
A) Arabidopsis MIRNA hairpins. B) Rice MIRNA hairpins. smRNA signatures were obtained from the MPSS Plus Database [85] (http://mpss.udel.edu) and searched against MIRNA hairpin sequences (http://microrna.sanger.ac.uk) and reference genome sequences (http://www.ncbi.nlm.nih.gov) by BLAST [91]. The normalized abundance of unique MPSS signatures (Log2, transcripts per quarter million [TPQ]) was plotted as a function of the normalized position of signatures relative to the start of the miRNA site on each individual hairpin. Sense smRNAs are indicated as open blue circles; antisense smRNAs are displayed as red closed circles. A cartoon for miRNA hairpin is shown under panel B to align the first nucleotide of mature miRNA (coordinate “0” on X-axis, purple box) and the first nucleotide of miRNA* (relative coordinate “1” on X-axis, yellow box) to the hairpin. See Datafile S1 for details.
Figure 2Normalized average percentage expression levels for 93 “ancient” (22 families) (A) and 68 recently evolved “new” (64 families) MIRNA genes (B), with miRNA* position as “0”.
Sense strand is colored red and antisense blue. Note the abundant antisense signals mapping at or upstream to miRNA* sites (small arrow), and downstream sense signals for ancient MIRNA genes (large arrowhead) similar to miRNA target genes previously described [14]. See Datafile S2 for details.
Tiling array signals for all Arabidopsis and rice MIRNA hairpinsa.
| Species | Region of miRNA hairpin | |||||||||||
| miRNA site | miRNA* site | other smRNAs | No smRNA | |||||||||
| sense | anti | sense | anti | sense | anti | sense | anti | |||||
| (Log2Signal intensity)/probe | ||||||||||||
| (number of probes) | ||||||||||||
|
| 0.96 | 1.26 | 0.52 | 0.49 | 0.64 | 0.15 | 0.58 | 0.40 | ||||
| (528) | (484) | (465) | (531) | (47) | (41) | (852) | (780) | |||||
|
| 1.53 | 0.86 | 0.46 | 1.43 | 1.23 | 0.34 | 1.03 | 1.11 | ||||
| (44) | (45) | (52) | (38) | (7) | (3) | (78) | (78) | |||||
: Only the tiling array signals for regions of miRNA hairpins mapped by MPSS smRNA signatures will be counted. MPSS smRNA data was downloaded from http://mpss.udel.edu. See Datafile S3 for details.
: Every probe is unique in the relative genome. A probe was counted as exclusively mapping to a region of the hairpin if a minimum of 11 contiguous n.t. in the probe overlapped with the 21 n.t. mature miRNA or miRNA* site, or 7 n.t. overlapped with the 17 n.t. MPSS smRNA signatures.
: Significantly different than combined no smRNA signals, P<.0008 (Student's two tailed t-test, equal variance model).
Antisense transcription signals relative to sense strand expression from Arabidopsis whole genome tiling arraysa.
| Gene class | Genes with low sense/antisense exon signal ratio | Genes with high sense/antisense exon signal ratio | Ratio |
|
|
| 33 | 333 | 0.10 | 7.6e−64 |
|
| 49 | 15 | 3.3 | 0.00002 |
|
| 3879 | 2943 | 1.3 | 0.000001 |
| sORF | 426 | 210 | 2.0 | 3.7e−18 |
| with as-TU | 692 | 598 | 1.16 | 0.005 |
| qRT-PCR verified | 302 | 106 | 2.8 | 3.3e−23 |
|
| 426 | 179 | 2.4 | 1.7e−24 |
| sORF | 280 | 142 | 2.0 | 8.6e−12 |
| with as-TU | 32 | 12 | 2.7 | 0.002 |
| qRT-PCR verified | 17 | 4 | 4.2 | 0.004 |
|
| 274 | 99 | 2.8 | 1.1e−20 |
| sORF | 119 | 50 | 2.4 | 5.7e−8 |
| with as-TU | 33 | 13 | 2.5 | 0.002 |
| qRT-PCR verified | 9 | 3 | 3.0 | 0.07 |
|
| 362 | 102 | 3.6 | 3.4e−36 |
| sORF | 1 | 0 | N.A. | N.A. |
| with as-TU | 43 | 11 | 3.9 | 0.00001 |
| qRT-PCR verified | 44 | 18 | 2.4 | 0.007 |
|
| 412 | 137 | 3.0 | 1.4e−33 |
| sORF | 2 | 0 | N.A. | N.A. |
| with as-TU | 74 | 26 | 2.8 | 0.000001 |
| qRT-PCR verified | 39 | 11 | 3.5 | 0.00007 |
|
| 257 | 107 | 2.4 | 1.2e−15 |
| sORF | 8 | 4 | 2.0 | 0.19 |
| with as-TU | 39 | 19 | 2.0 | 0.006 |
| qRT-PCR verified | 27 | 4 | 6.7 | 0.00004 |
|
| 2148 | 2319 | 0.9 | 0.005 |
| sORF | 16 | 15 | 1.1 | 1.0 |
| with as-ncTU | 471 | 517 | 0.9 | 0.07 |
| qRT-PCR verified | 166 | 66 | 2.5 | 0.000001 |
|
| 438 | 187 | 2.3 | 1.8e−24 |
| with as-TU | 56 | 31 | 1.8 | 0.005 |
| qRT-PCR verified | 53 | 10 | 5.3 | 0.000001 |
|
| 982 | 720 | 1.4 | 0.000001 |
| with as-TU | 161 | 182 | 0.9 | 0.14 |
| qRT-PCR verified | 125 | 20 | 6.2 | 0.000001 |
. Footnotes.
: Gene annotation is from TAIR Release 9 (http://www.arabidopsis.org/). Arabidopsis whole genome tiling array data was from previous reports [82], [83]. For each gene, the ratio of sense/antisense exon signal is calculated according to the following formula: ratio = [(sense exon signals/probe numbers)/(antiense exon signals/probe numbers)]/[(sense intron signals/probe numbers)/(antiense intron signals/probe numbers)]. See Supplemental Text File S1 and Datafile S4 for details.
: One-tailed binomial distribution, normal approximation model, except as noted.
: Validated and predicted miRNA targets were extracted from ASRP database for miRNAs 156, 162, 163, 168, 172, 393, 400, 403, 472, 773 and 780 (http://asrp.cgrb.oregonstate.edu). These targets produce significant numbers of antisense siRNAs [10], [12]–[14];.
: Genes reported as “unknown” were collected from the TAIR9 release for Arabidopsis genome (http://www.arabidopsis.org).
: Small open reading frames (sORFs) were from [27].
: Genes with antisense transcript units were from [58].
: Genes with antisense transcripts verified by quantitative RT-PCR were from Y. Xiao and C.D. Town, personal communication.
: Unknown genes with different confidence ratings were from TAIR9 (http://www.arabidopsis.org). Zero rating means no expression data. One star rating means there is weak EST data, and/or another type of low quality functional evidence. Higher (2–5 star) rankings derive from qualitative meta-analysis of full-length cDNAs, proteomics, moncot and dicot cross-species sequence alignments, and genomic conservation.
: Protein-coding genes with antisense smRNAs were from [107]; see Datafile S4.
New miRNA homologs and hairpin-like sequences found on antisense strand of annotated protein coding genesa.
| miRNA hairpin | Homologous genes with low sense/antisense exon signal ratio | Homologous genes with high sense/antisense exon signal ratio | TAIR9 annotation, position of homology | Expression data quality (star rating) | Antisense EST? | E-value homology of AGI sequence to cognate hairpin |
|
| AT2G19420 | Unknown, intron | 1 | 2 e−54 | ||
|
| AT2G19300 | None | Unknown, exon | 5 | 3 e−7 | |
| AT1G07650 | None | LRR-kinase, 3′UTR | 4 | AV529349 | 4 e−11 | |
|
| AT1G68870 | None | Unknown, exon | 5 | 5 e−8 | |
| AT2G21420 | None | Zinc-finger like | 5 | 2 e−16 | ||
|
| AT1G74458 | None | Unknown, exon | 4 | 4 e−24 | |
|
| AT1G66300 | AT1G66290 | F-box like, exon | 2; 1 | 2 e−34; 3 e−40 | |
| AT1G66310 | AT1G66640 | 5; 1 | 7e−24; 1 e−17 | |||
| AT1G66320 | 1 | 2e−21 | ||||
|
| AT4G24410 | Unknown, exon | 1 | BX820858 | 1 e−74 | |
|
| AT4G03050 | None | AOP3 | 5 | 4 e−10 | |
|
| AT4G13570 | None | HTA4 | 3 | 2 e−30 | |
|
| antisense-TU Group4327 | Prmtr At3g48030 | 5 e−9 | |||
|
| AT1G61230g (including 11 candidate targets) | (including 2 candidate targets) | jacalin-like, exon | 2 | 9 e−19 | |
|
| AT2G06095 | None | Unknown, exon | 2 | EG435138 | 5 e−32 |
|
| AT1G55045 | Unknown, exon | 0 | phased | 0.03 | |
| AT5G26262 | 0 | smRNAs | ||||
|
| AT2G31141 | Unknown, exon | 5 | smRNAs | 3 e−20 |
. Footnotes.
: Gene annotation was from TAIR Release 9. For each gene, the ratio of sense/antisense exon signal is calculated according to the following formula: ratio = [(sense exon signals/probe numbers)/(antiense exon signals/probe numbers)]/[(sense intron signals/probe numbers)/(antiense intron signals/probe numbers)]. All Arabidopsis genes were ranked based on this sense/antisense exon signal ratio. See Supplemental Text File S1 and Datafile S4 for details. All listed genes produce antisense smRNAs except for AT1G68870 which has a sense smRNA [78], [85], [94]. AT1G74458 encodes miR415 homologue on the sense strand. See http://mpss.udel.edu.
: The star rating for gene expression refers to the legend of Table 2.
: expression elevated in miRNA metabolism mutants hst-15 and hyl1-2 [94]. AT1G07650 was previously predicted as a target of miR404 [159].
: expression elevated in a miRNA metabolism mutants, hen1-1 [94].
: Homologues AT4G03060/AOP2/and AT2G38810/HTA8 were previously described as evolutionarily-related loci for miR826 and miR841, respectively [78]. Interestingly, AT4G03050/AOP3 is a source of smRNAs sequenced from immuno-precipitated AGO4 [96].
: A 2.2 kb antisense non-coding RNA described by Matsui et al. [58] that overlaps with At3g48030 and its promoter.
: validated and predicted jacalin/lectin targets [77], [78], [94]. Genes with low sense/antisense exon signal ratio: AT1G52050, AT1G52060, AT5G28520, AT1G52120; AT1G52130, AT1G60130, AT5G38550, AT5G49870, AT5G49850, AT1G57570, AT1G60110; Genes with high sense/antisense exon signal ratio: AT2G25980, AT1G52070.
: There is bioinformatic evidence these are not bona fide miRNAs: miR414 is homologous to transposon ATHAT1 and rice ORSgTETN00400025 (E = 2 e−10) [105] (http://plantrepeats.plantbiology.msu.edu/search.html); miR783 is homologous to AT1G46120 transposable element gene (E = 3 e−57) and maps between predicted F-box-like homologues AT1G66300 and AT1G66331; miR855 has significant homology to antisense strand of miR401 (E = 2e−37; noted also in [65]), VANDAL17, and Gypsy_Ty3-like transposons (E = 1e−108).
: Significant homology to unclassified rice transposon ORSgTETNOOT00686 (http://plantrepeats.plantbiology.msu.edu/search.html).
: 7SL is the ncRNA component of the signal recognition particle involved in targeting and translocation of proteins to the endoplasmic reticulum. There are three 7SL homologues described in Arabidopsis; AT2G31141 produces abundant smRNAs from the sense strand and was previously described as Ath-383 7SL-like ncRNA [29].
Accuracy, Sensitivity, Precision and Specificity of an expression-based Support Vector Machine for miRNA target gene prediction trained on 86 Arabidopsis miRNA target genes and 125 non-target paralogs.
| Combination | Accuracy | Sensitivity | Precision | Specificity |
|
| 0.972 | 0.977 | 0.955 | 0.968 |
|
| 0.972 | 0.977 | 0.955 | 0.968 |
|
| 0.972 | 1.000 | 0.935 | 0.952 |
|
| 0.697 | 0.314 | 0.844 | 0.960 |
|
| 0.697 | 0.302 | 0.867 | 0.968 |
|
| 0.970 | 1.000 | 0.945 | 0.960 |
|
| 0.592 | 0.000 | NaN | 1.000 |
: NaN: Not A Number, due to division by zero.
See Datafile S2 for details.
Accuracy of the Support Vector Machine in predicting Arabidopsis MIRNA genes based on energy, expression topology and smRNAs.
| Test | Accuracy of Prediction |
|
| 0.841 |
|
| 0.765 |
|
| 0.808 |
See Datafile S2 for details.