| Literature DB >> 22206472 |
Abstract
BACKGROUND: miRNAs are ~21 nucleotide long small noncoding RNA molecules, formed endogenously in most of the eukaryotes, which mainly control their target genes post transcriptionally by interacting and silencing them. While a lot of tools has been developed for animal miRNA target system, plant miRNA target identification system has witnessed limited development. Most of them have been centered around exact complementarity match. Very few of them considered other factors like multiple target sites and role of flanking regions. RESULT: In the present work, a Support Vector Regression (SVR) approach has been implemented for plant miRNA target identification, utilizing position specific dinucleotide density variation information around the target sites, to yield highly reliable result. It has been named as p-TAREF (plant-Target Refiner). Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects. Further, p-TAREF was run over the experimentally validated miRNA targets from species like Arabidopsis, Medicago, Rice and Tomato, and detected them accurately, suggesting gross usability of p-TAREF for plant species. Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant. The entire methodology has been implemented in a multi-threaded parallel architecture in Java, to enable fast processing for web-server version as well as standalone version. This also makes it to run even on a simple desktop computer in concurrent mode. It also provides a facility to gather experimental support for predictions made, through on the spot expression data analysis, in its web-server version.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22206472 PMCID: PMC3293931 DOI: 10.1186/1471-2164-12-636
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1p-TAREF workflow. The figure illustrates the various working stages involved in p-TAREF along with concurrency.
Figure 2The p-TAREF webserver. The web-server provides a friendly interface to load query sequences, with various parameter settings which include selection of energy cut-off, mismatch level allowed, SVR Kernel to be used, number of processors to be used, etc. Its performance tab contains detailing about all performance measures done for p-TAREF performance benchmarking and comparison with other tools.
Figure 3Snapshot of standalone GUI version of p-TAREF. Like its web-server counterpart, the standalone GUI version too provides concurrency and most of the features, enabling quick standalone scanning of batch and large amount of sequence data. It also shows a progress bar to intimate about the status of analysis.
Impact of concurrency in p-TAREF.
| # of processor/Mismatches | 8 | 4 | 2 | 1 |
|---|---|---|---|---|
| 4 | 1 Hour 43 min | 3 Hours 21 min | 5 Hours 01 min | 8 Hours 37 min |
| 3 | 1 Hours 17 min | 3 Hours 00 min | 4 Hours 34 min | 6 Hours 07 min |
| 2 | 46 min | 2 Hours 21 min | 3 Hours 53 min | 5 Hours 42 min |
| 1 | 42 min | 1 Hours 52 min | 3 Hours 14 min | 4 Hours 21 min |
| 0 | 37 min | 1 Hours 14 min | 2 Hours 05 min | 3 Hours 01 min |
| Target-Align | NA | NA | NA | 92 Hours 26 min |
p-TAREF was run over total 205 genes, with different number of processors, having Intel Xeon processors with 2.5 Ghz clock speed. The last row compares it with another tool Target-align, which is available as standalone, serially coded alignment based tool.
Figure 4Impact of concurrency over execution speed. p-TAREF was run over a set of genes for target identification, with different number of processors added through concurrency. As can be found, concurrency caused drastic reduction in processing time, which is highly beneficial in performing accurate transcriptome wide analysis.
Performance comparison between psRNA-target, Target-align and p-TAREF.
| psRNA target | Target-align | P-TAREF (polynomial kernel) | ||||
|---|---|---|---|---|---|---|
| TP | 81 | 119 | 64 | 103 | 104 | 262 |
| FN | 23 | 168 | 40 | 184 | 0 | 25 |
| TN | 119 | 119 | 119 | 119 | 119 | 119 |
| FP | 0 | 0 | 0 | 0 | 0 | 0 |
| Sn | 77.88 | 41.16 | 61.53 | 35.888 | 100 | 91.29 |
| Sp | 100 | 100 | 100 | 100 | 100 | 100 |
| MCC | 0.81 | 0.4146 | 0.678 | 0.4586 | 1 | 0.86 |
| ACU% | 89.68 | 58.620 | 82.06 | 50.800 | 100 | 93.84 |
*TP = True Positive, FP = False positive, TN = True Negative, FN = False negative, Sn = sensitivity, Sp = Specificity, MCC = Mathew Correlation Coefficient; Ac = Accuracy.
For experimentally validated targets, derived from two different sources, the tools were compared for performance. In the given table, p-TAREF was compared and found performing better than the compared tools for the given datasets. Performance related testing details are given in text, Additional File 1 as well as at server's performance page. The observed MCC value suggests about the robustness of model implemented in p-TAREF.
Performance comparison between TAPIR, Target-align and p-TAREF for Target-align/TAPIR Reference dataset for benchmarking.
| TAPIR | Target-align | p-TAREF | |||
|---|---|---|---|---|---|
| TP Rate % | 91.83 | 93.14 | 97.05 | 93.14 | 100 |
| FP Rate % | 81.47 | 88.97 | 84 | 57.8 | 56.2 |
*TP = True Positive;FP = False Positive
The same benchmarking dataset and procedure was used for p-TAREF as had been used previously by the two tools. P-TAREF was found performing better.
Figure 5The ROC plots for classifier models of p-TAREF with 10 fold cross validation. As the plots show, the classifier was found robust in performance with high AUC values, where the highest one was observed for polynomial kernel model. For cases A-F, two major experimentally validated data sources, Beuclair et al(2010) and ASRP, were used to prepare the datasets. For cases F and H, tests were performed using the reference test set as well as protocol used by TAPIR and Target-align. The curves represent the following tests: A) Linear Kernel/ASRP B) Gaussian Kernel/ASRP C) Polynomial Kernel/ASRP D) Linear/Beuclair E) Gaussian/Beuclair F) Polynomial/Beuclair G)Target-align/(Tapir/Target align dataset) H) p-TAREF(Tapir/Target-Align dataset).
Figure 6miRNAs target distribution in Oryza sativa. The major miRNA families found targeting the various genes in rice transcriptome.
Figure 7Graphical representation of targets of miR156 in rice transcriptome. All the targets shown here scored inverse expression correlation with miR156, having absolute value of 0.8 or higher.
Identified targets of miR156 in the rice transcriptome.
| Transcript Id | Transcript annotation | Expression Correlation | SVR Score |
|---|---|---|---|
| LOC_Os08g41480.1 | SAM domain containing protein, putative, expressed | -0.92 | 4.494426 |
| LOC_Os10g34064.1 | retrotransposon protein, putative, unclassified | -0.92 | 3.55511 |
| LOC_Os08g38240.1 | transposon protein, putative, CACTA, En/Spm sub-class, expressed | -0.89 | 0.305948 |
| LOC_Os02g01250.1 | LSM domain containing protein, expressed | -0.88 | 3.846277 |
| LOC_Os03g10850.2 | FAD-linked sulfhydryl oxidase ALR, putative, expressed | -0.88 | 3.9567 |
| LOC_Os03g24410.1 | conserved hypothetical protein | -0.88 | 4.39943 |
| LOC_Os01g66940.1 | kinase, pfkB family, putative, expressed | -0.87 | 3.55511 |
| LOC_Os03g55610.1 | dof zinc finger domain containing protein, putative, expressed | -0.87 | 3.73214 |
| LOC_Os04g28090.1 | MYB family transcription factor, putative, expressed | -0.87 | 3.01288 |
| LOC_Os05g34730.1 | ethylene-responsive transcription factor ERF020, putative, expressed | -0.87 | 3.55511 |
| LOC_Os11g04730.1 | DNA-directed RNA polymerases I, II, and III subunit RPABC1, putative, expressed | -0.87 | 2.02559 |
| LOC_Os11g37080.1 | h/ACA ribonucleoprotein complex subunit 1-like protein 1, putative, expressed | -0.87 | 3.73214 |
| LOC_Os02g12580.1 | OsPP2Ac-3 - Phosphatase 2A isoform 3 belonging to family 1, expressed | -0.86 | 0.591197 |
| LOC_Os02g38200.1 | dehydrogenase, putative, expressed | -0.86 | 3.01288 |
| LOC_Os02g51880.1 | amine oxidase, putative, expressed | -0.86 | 3.08791 |
| LOC_Os03g55220.1 | bHelix-loop-helix transcription factor, putative, expressed | -0.86 | 1.31754 |
| LOC_Os03g63730.1 | RNA recognition motif containing protein, putative, expressed | -0.86 | 0.742023 |
| LOC_Os03g63730.1 | RNA recognition motif containing protein, putative, expressed | -0.86 | 2.63747 |
| LOC_Os06g41384.1 | zinc finger C-x8-C-x5-C-x3-H type family protein, expressed | -0.86 | 3.08791 |
| LOC_Os08g42620.1 | zinc finger DHHC domain-containing protein, putative, expressed | -0.86 | 3.01288 |
| LOC_Os09g29980.2 | transposon protein, putative, CACTA, En/Spm sub-class, expressed | -0.86 | 3.01288 |
| LOC_Os12g16130.1 | transposon protein, putative, unclassified, expressed | -0.86 | 0.656255 |
| LOC_Os02g26140.1 | microtubule-binding protein TANGLED1, putative, expressed | -0.85 | 1.95761 |
| LOC_Os06g02560.1 | growth-regulating factor, putative, expressed | -0.85 | 2.543472 |
| LOC_Os10g03640.1 | hypothetical protein | -0.85 | 2.54657 |
| LOC_Os10g41390.1 | protein kinase domain containing protein, expressed | -0.85 | 0.411063 |
| LOC_Os12g44130.1 | expressed protein | -0.85 | 0.934949 |
| LOC_Os10g41390.1 | protein kinase domain containing protein, expressed | -0.85 | 1.64854 |
| LOC_Os12g09280.1 | RNA polymerase subunit, putative, expressed | -0.85 | 2.54657 |
| LOC_Os01g08200.1 | ubiquitin carboxyl-terminal hydrolase 14, putative, expressed | -0.84 | 2.5038 |
| LOC_Os01g50340.1 | transposon protein, putative, unclassified, expressed | -0.84 | 1.22697 |
| LOC_Os03g10930.1 | ribosomal protein L51, putative, expressed | -0.84 | 0.264129 |
| LOC_Os03g17950.1 | expressed protein | -0.84 | 1.17905 |
| LOC_Os06g35530.1 | CGMC_GSK.8 - CGMC includes CDA, MAPK, GSK3, and CLKC kinases, expressed | -0.84 | 0.996875 |
| LOC_Os07g01540.1 | Ser/Thr protein phosphatase family protein, putative, expressed | -0.84 | 2.517799 |
| LOC_Os08g02540.1 | adenylate kinase, putative, expressed | -0.84 | 2.51778 |
| LOC_Os08g02730.1 | plant protein of unknown function domain containing protein, expressed | -0.84 | 1.78162 |
| LOC_Os08g04780.1 | amine oxidase, putative, expressed | -0.84 | 0.111441 |
| LOC_Os08g44380.1 | L1P family of ribosomal proteins domain containing protein, expressed | -0.84 | 2.54161 |
| LOC_Os09g25620.1 | CPuORF8 - conserved peptide uORF-containing transcript, expressed | -0.84 | 2.08723 |
| LOC_Os09g39020.1 | N-rich protein, putative, expressed | -0.84 | 1.58374 |
| LOC_Os10g33230.1 | RNA recognition motif containing protein, putative, expressed | -0.84 | 1.95032 |
| LOC_Os12g37380.1 | RNA pseudouridine synthase, putative, expressed | -0.84 | 2.08857 |
| LOC_Os01g04730.1 | ribosomal protein L24, putative, expressed | -0.83 | 2.62583 |
| LOC_Os01g09030.1 | 2-aminoethanethiol dioxygenase, putative, expressed | -0.83 | 2.63747 |
| LOC_Os01g16220.1 | Sad1/UNC-like C-terminal domain containing protein, putative, expressed | -0.83 | 1.68967 |
| LOC_Os01g41880.1 | hyaluronan/mRNA binding family domain containing protein, expressed | -0.83 | 2.50997 |
| LOC_Os03g27990.1 | STRUBBELIG-RECEPTOR FAMILY 7 precursor, putative, expressed | -0.83 | 2.03837 |
| LOC_Os03g28410.1 | ribosomal protein S2, putative | -0.83 | 1.94684 |
| LOC_Os04g30680.1 | conserved hypothetical protein | -0.83 | 1.94949 |
The listed targets scored at least 0.8 inverse expression correlation or higher.
Top 20 most significant GO terms found associated with miR156 targets in the rice transcriptome.
| Rank | Cellular component | Molecular Function | Biological function | |||
|---|---|---|---|---|---|---|
| 1 | cell wall | 2.20e-16 | RNA binding | 2.20e-16 | cellular protein metabolic process | 2.20e-16 |
| 2 | cytosolic large ribosomal subunit | 2.20e-16 | copper ion binding | 2.20e-16 | DNA replication | 2.20e-16 |
| 3 | ribosome | 2.20e-16 | aspartic-type endopeptidase activity | 2.20e-16 | response to cadmium ion | 2.20e-16 |
| 4 | ribonucleoprotein complex | 2.20e-16 | aspartate kinase activity | 2.20e-16 | DNA integration | 2.20e-16 |
| 5 | mitochondrial inner membrane | 2.20e-16 | DNA-directed DNA polymerase activity | 2.20e-16 | translation | 2.20e-16 |
| 6 | Golgi apparatus | 2.20e-16 | zinc ion binding | 2.20e-16 | cellular amino acid biosynthetic process | 2.20e-16 |
| 7 | cytosolic small ribosomal subunit | 2.20e-16 | ubiquitin thiolesterase activity | 2.20e-16 | microtubule-based movement | 2.20e-16 |
| 8 | nuclear pore | 2.20e-16 | microtubule motor activity | 2.20e-16 | cellular amino acid metabolic process | 2.20e-16 |
| 9 | mitochondrion | 2.20e-16 | triose-phosphate isomerase activity | 2.20e-16 | intracellular protein transport | 2.20e-16 |
| 10 | cytoplasm | 2.20e-16 | branched-chain-amino-acid transaminase activity | 2.20e-16 | protein import into nucleus, docking | 2.20e-16 |
| 11 | cytosol | 2.20e-16 | structural constituent of ribosome | 2.20e-16 | shoot development | 2.20e-16 |
| 12 | cytoskeleton | 2.20e-16 | nucleic acid binding | 2.20e-16 | proteolysis | 2.20e-16 |
| 13 | cytosolic ribosome | 2.20e-16 | translation initiation factor activity | 2.20e-16 | branched chain family amino acid metabolic process | 2.20e-16 |
| 14 | nucleolus | 2.20e-16 | DNA binding | 2.20e-16 | ubiquitin-dependent protein catabolic process | 2.956e-16 |
| 15 | plasma membrane | 5.21E-015 | glyceraldehyde-3-phosphate dehydrogenase activity | 2.20e-16 | embryo development ending in seed dormancy | 1.114e-15 |
| 16 | proteasome complex | 1.09e-14 | NAD binding | 4.27E-015 | vesicle-mediated transport | 3.006e-15 |
| 17 | COPI vesicle coat | 1.20e-13 | glyceraldehyde-3-phosphate dehydrogenase (NAD+) (phosphorylating) activity | 1.649e-14 | rRNA processing | 1.756e-14 |
| 18 | outer membrane | 1.39e-12 | ligase activity | 1.96e-14 | translational elongation | 2.675e-14 |
| 19 | protein complex | 1.43e-15 | unfolded protein binding | 2.25E-014 | protein folding | 4.47E-014 |
| 20 | small ribosomal subunit | 3.20e-12 | hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides | 2.67e-14 | response to hormone stimulus | 1.870e-13 |
The top scoring 20 terms associated with three GO categories are given with their associated significance scores (p-value).
Figure 8Hypergeometric tests for enrichment of GO functional categories terms for molecular function. The observation was made for enrichment of molecular functions found enriched and associated with targets of miR156. The colored nodes are functional categories whose genes were found significantly enriched in the pool of miR156 targets. Darker the color, more significant is the enrichment.