Literature DB >> 23193297

MiRGator v3.0: a microRNA portal for deep sequencing, expression profiling and mRNA targeting.

Sooyoung Cho¹, Insu Jang, Yukyung Jun, Suhyeon Yoon, Minjeong Ko, Yeajee Kwon, Ikjung Choi, Hyeshik Chang, Daeun Ryu, Byungwook Lee, V Narry Kim, Wankyu Kim, Sanghyuk Lee.

Abstract

Biogenesis and molecular function are two key subjects in the field of microRNA (miRNA) research. Deep sequencing has become the principal technique in cataloging of miRNA repertoire and generating expression profiles in an unbiased manner. Here, we describe the miRGator v3.0 update (http://mirgator.kobic.re.kr) that compiled the deep sequencing miRNA data available in public and implemented several novel tools to facilitate exploration of massive data. The miR-seq browser supports users to examine short read alignment with the secondary structure and read count information available in concurrent windows. Features such as sequence editing, sorting, ordering, import and export of user data would be of great utility for studying iso-miRs, miRNA editing and modifications. miRNA-target relation is essential for understanding miRNA function. Coexpression analysis of miRNA and target mRNAs, based on miRNA-seq and RNA-seq data from the same sample, is visualized in the heat-map and network views where users can investigate the inverse correlation of gene expression and target relations, compiled from various databases of predicted and validated targets. By keeping datasets and analytic tools up-to-date, miRGator should continue to serve as an integrated resource for biogenesis and functional investigation of miRNAs.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
MicroRNAs
RNA, Messenger

Year: 2012 PMID： 23193297 PMCID： PMC3531224 DOI： 10.1093/nar/gks1168

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Over the past 2 years, the number of known microRNAs (miRNAs) in human has almost tripled (1). The catalog of miRNA information is usually deposited in databases such as miRBase (1) and PMRD (2). In miRNEST (3), novel miRNA candidates are predicted from expressed sequence tag (EST) sequences in various animals, plants and viruses. The miRNAs of related sequences are grouped as RNA family as in Rfam (4). Regarding miRNA targets, validated targets are still sparse but are available at miRecords (5), Tarbase (6) and miRTarBase (7). Many target prediction methods were developed including TargetScan (8), microRNA.org (9), miRBase (1), PITA (10), PicTar (11), miRDB (12) and their combinations (13). These programs usually suffer from a large number of false positives. Other tools that provide analytics functions based on miRNA and mRNA expression profiles include HOCTAR (14) and miRFANS (15). The biology of miRNAs is turning out to be much more complex than initially thought, where a single miRNA may have multiple isoforms (iso-miRs) and often undergo modifications such as 3′-nucleotide addition (16). Comprehensive profiling of such miRNA variants is necessary to understand the function of miRNAs in the context of various human diseases and other perturbations. Deep sequencing technique is rapidly replacing the hybridization-based methods due to its ability to catalog and quantify miRNAs (and their variants) in an unbiased and accurate manner. Accordingly, several web tools and databases, including deepBase (17), miRTools (18), miRanalyzer (19) and miRDeepFinder (20), were developed to analyze the deep sequencing data. Even though deep sequencing has become the main driving force in uncovering novel miRNAs and expression changes, we still lack a comprehensive and integrated database of miRNA sequencing, expression profiling and targeting information, implemented with proper tools. Here, we introduce the miRGator v3.0 that consolidated an extensive datasets of deep sequencing studies. The user interface is fully renovated with a dedicated miRNA-seq browser and two novel viewers that enable users to examine the miRNA–target relationships with expression correlation information readily accessible. We describe the main characteristics of the updated system in the following sections.

SYSTEM OVERVIEW

The schematic overview of miRGator v3.0 is shown in Figure 1. We have included deep sequencing data available in public, which have become the principal resource for information on miRNA diversity and expression. The datasets were manually curated into ontology-based disease and tissue categories. We have compiled 73 studies with 4665 samples into 38 disease and 71 anatomic categories.

Figure 1.

System overview of miRGator v3.0.

System overview of miRGator v3.0. Major features, summarized in Figure 1, include (i) miR-seq browser, which allows users to examine short read alignment for identifying iso-miRs and differential expression in multiple samples; (ii) expression profiles in various organs, tissues and diseases, based on deep sequencing data; (iii) novel representation of miRNA–target relations in correlation heat-maps and network views of gene expression and (iv) gene set analysis for functional annotation of miRNA-associated genes.

DATASETS AND PROCESSING OF SEQUENCING DATA

We have collected 73 deep sequencing datasets on human samples from Gene Expression Omnibus (GEO) (21), Short Read Archive (SRA) (22) and The Cancer Genome Atlas (TCGA) archives (23). GEO and SRA included 54 studies of miRNA and mRNA sequencing (716 samples and 4.1 billion short reads). Additionally, we added the expression profiles of miRNAs and mRNAs in cancer samples from the TCGA archive (19 studies, 3949 samples in 17 cancer types). TCGA data are particularly useful in investigating the inverse expression correlation of miRNA and target mRNAs in various types of cancer. Note that the TCGA level 3 data include the processed output only, not the raw sequence data. All GEO/SRA experiments and TCGA data were manually annotated into tissue and disease types using the controlled vocabulary of eVOC (24) and MeSH (25), respectively. Table 1 shows the summary of datasets included in this update.

Table 1.

Statistics for deep sequencing data and curation result

		GEO	SRA	TCGA	Total
Curation	No. of studies	44	10	19	73
	No. of samples	660	56	3949	4665
	No. of anatomies	54	15	18	71
	No. of diseases	26	4	17	38
Mapping	No. of total reads	3 651 203 657	545 986 295	–	4 197 189 952
	No. of trimmed reads	2 704 297 513	147 800 838	–	2 852 098 351
	No. of mapped reads	2 129 934 409	392 826 996	–	2 522 761 405
	No. of mapped reads to miRNAs region	1 663 515 565	286 992 242	–	1 950 507 807
	No. of mapped reads to ncRNAs region	108 819 368	20 060 074	–	128 879 442
	No. of mapped reads to genomic region	191 686 502	22 757 497	–	214 443 999
Processing result	No. of pre-miRNAs	1521	1429	747	1522
	No. of mature miRNAs	1843	1661	934	1856
	No. of other ncRNAs	6421	6286	–	6424
	No. of predicted pre-miRNAs	286	69	–	304
	No. of predicted mature miRNAs	475	94	–	508

Statistics for deep sequencing data and curation result The miRNA deep sequencing data were aligned to the reference human genome (hg19) using the Bowtie program (version 0.12.7) (26) after trimming adaptor sequences by Cutadapt (version 1.1) (27) obtained from the original paper or manufacturer platform. Up to two mismatches were allowed in the alignment process to identify iso-miRs or miRNA modifications. Short reads mapped onto the known miRNA loci from miRBase v18 (1) or ncRNA region from Ensembl (release 67) (28) were classified as miRNA or ncRNA reads, respectively. This procedure yielded 1856 known miRNAs and 6424 ncRNAs. Remaining reads were used to predict novel miRNAs using the mirDeep2 software (29). Using the estimated true-positive probability of 95% and randfold P-value of 0.05, we obtained 508 mature and 304 pre-miRNA candidates. Further details of the analysis pipeline and program options are available in the online documentation. For quantification of miRNA abundance, we used the quantile normalization method for read numbers within each miRNA locus. Differentially expressed miRNAs (DEmiRs) between tumor and normal tissues were obtained by edgeR program (version 2.6.10) (30) after converting the normalized number into the nearest integer value. RNA-seq data were aligned to the human genome (hg19) by the TopHat program (version 2.0.0) (31) after removing adaptor sequences and critical examination of quality controls. Cufflinks (version 1.3.0) (32) was used to quantify the mRNA abundance.

miR-seq BROWSER

miR-seq browser was specifically designed to examine the sequence alignment and normalized read counts with the secondary structure information in an intuitive and interactive fashion. Short reads related to iso-miRs and miRNA editing can be readily identified with the corresponding expression values (read counts) in multiple samples. This feature can be of significant value for scientists studying biological roles of iso-miRs and miRNA editing. Figure 2 shows the screen shot of miR-seq browser. The secondary structure, obtained from Vienna RNA package (33), is displayed on the top panel and also indicated as different shades in the alignment window. Selecting each nucleotide in the secondary structure highlights corresponding nucleotide in the sequence alignment panel. Mismatch sequences are indicated in red color. Users may add, delete or edit read sequences. The read count table can be used to explore the variable expression of iso-miRs and differential miRNA processing. Expression level is also reflected as the background color of each cell in this table. We have further implemented many user-friendly features such as zoom-in/out, reordering of reads (drag & drop), sorting by expression level and save/restore support of configuration. It is also possible to upload the user sequences in the BAM file format. Detailed instructions for using miR-seq browser are available in the online help page.

Figure 2.

Main features of the miR-seq Browser. At the top panel, the hairpin structure of miRNA precursor is shown. The aligned short reads are shown together with secondary structure and read depth information in the track. By mouseover (hand icon) on a nucleotide, the corresponding columns are highlighted in vertical pink shadow. The reads can be sorted by the read count of each sample or the total sum on the right panel. Note that several read sequences show 3′-end modifications. Histogram shows the read depth at each position. Mismatched nucleotides are highlighted in red. Sequence editor window is opened by right-click.

miRNA, TARGET mRNA AND EXPRESSION CORRELATION

Inferring molecular functions of miRNAs is a non-trivial process due to the uncertainty in relationships between miRNA and target mRNAs. Only small portions of target mRNAs are known for a limited number of miRNAs, and typical programs tend to yield too many false positives. We have compiled a variety of miRNA–mRNA relationships and integrated them with the expression correlations to help users identify reliable targets readily. Validated miRNA target genes were obtained from miRecords (version 3), mirTarBase (version 2.5) and Tarbase (version 5). Predicted target relationships were collected from Microcosm Targets (version 5) (34), miRDB (version 4), miRNA.org (August 2010), PITA (version 6), PicTar (May 2004) and TargetScan (version 6.2). In total, miRGator v3.0 includes 4745 validated and 6 218 792 predicted target relations, nearly doubled from the previous version. Expression correlation is useful information to discern between direct and indirect targets. Inversely correlated expression of miRNA and putative target mRNAs is a strong evidence for genuine relations. We calculated the correlation coefficient using the deep sequencing data of mRNA-seq and miRNA-seq from the same sample. We used the Spearman rank correlation which is robust to different normalization methods between mRNA-seq and miRNA-seq data. Target relation and expression correlation are visually represented in two formats as shown in Figure 3. The heat-map view shows the expression correlation between miRNA and target mRNAs within each dataset. The scatter plot of miRNA and mRNA expression can be displayed by clicking each cell to examine sample-dependent variation of gene expression. The source of target information is also indicated to help users identify consensus targets, which are more likely to be genuine targets (13). All information on target relationship and expression correlation is downloadable in Excel format to allow more elaborate analysis for users.

Figure 3.

Concurrent inspection of miRNA–mRNA target relations and expression correlation for hsa-let-7a-5p. (a) The validated and predicted miRNA targets are shown together with their expression correlations as heat-map. The expression values of the miRNA–target pair can be shown for a dataset by clicking a cell as shown in the inset picture. (b) An example of miRNA–target network visualization. The targets showing the opposite expression pattern to miRNA are closely placed. Network view shows the target relationship in the graph visualization format. Users may select the validated or predicted target relations, study ID of source data and samples. Gene expression levels or the fold changes, if applicable, are shown as the node color. Network view illustrates the target relations and expression correlations in more intuitive manner, but limited to display the expression in a single study or sample. It should be noted that the miRNA–mRNA relation can be queried either by the miRNA name or by the gene name, which is a useful feature to investigate any synergistic effect in miRNA or gene function (35).

GENE SET ANALYSIS

Gene Set Analysis (GSA) is commonly used in interpreting a list of genes from high-throughput experiments such as microarray and mass spectrometry. The GSA tool of miRGator v3.0 enables the user to compare a list of genes against a priori defined gene sets such as KEGG pathway, Gene Ontology, the validated/predicted miRNA target DBs and inversely coexpressed gene sets as described in the previous section. The statistical significance is calculated as P-value by hypergeometric test, which is corrected for multiple tests using Bonferroni method.

USER INTERFACE

The miRGator v3.0 website incorporates various user-friendly features. Most menus are self-evident except the miR-seq browser for which detailed instructions are available in the help page. Basic search can be performed for miRNA, disease and anatomy names. The search window suggests plausible keywords and supports the auto-complete mode. Search output for miRNA query consists of (i) basic information including GeneRIF information, (ii) relevant studies, (iii) samples in the selected study where the link to miR-seq browser is available and (iv) miRNA expression profiles in disease, tissue and organ categories. Anatomy or disease queries output relevant studies and the DEmiRs from each study. Search in the ‘miR-target & Expression menu’ can be performed for miRNA or gene of interest, and miRNA–target information is produced with expression correlation as explained in the previous section.

CONCLUSION

With the addition of deep sequencing data and implementation of several novel tools, miRGator v3.0 continues to be an integrated resource of up-to-date information on miRNA sequences, expression profiling and target identification. These new data and function would be valuable for understanding miRNA biogenesis and molecular functions. However, there are many aspects to improve. Regular update of inundating data is the most critical part since so many sequencing studies are in progress currently including the TCGA project. We plan to update the data annually. Another major advancement in plan is to expand the scope to other organisms such as mice where detailed phenotype information is available via the international mouse phenotyping consortium.

FUNDING

Korea Research Institute of Bioscience and Biotechnology (KRIBB) Research Initiative Program; National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) [2012-0006002, 2012-0006011, 2012-0366723, 2011-0014992, 2012-0000952]; GIST Systems Biology Infrastructure Establishment Grant (2012) through Ewha Research Center for Systems Biology (ERCSB); Ewha Global Top 5 grant and RP-Grant 2012 from Ewha Womans University. Funding for open access charge: KRIBB Research Initiative Program. Conflict of interest statement. None declared.

32 in total

1. Prediction of both conserved and nonconserved microRNA targets in animals.

Authors: Xiaowei Wang; Issam M El Naqa
Journal: Bioinformatics Date: 2007-11-29 Impact factor: 6.937

2. A comprehensive survey of 3' animal miRNA modification events and a possible role for 3' adenylation in modulating miRNA targeting effectiveness.

Authors: A Maxwell Burroughs; Yoshinari Ando; Michiel J L de Hoon; Yasuhiro Tomaru; Takahiro Nishibu; Ryo Ukekawa; Taku Funakoshi; Tsutomu Kurokawa; Harukazu Suzuki; Yoshihide Hayashizaki; Carsten O Daub
Journal: Genome Res Date: 2010-08-18 Impact factor: 9.043

3. NCBI GEO: archive for functional genomics data sets--10 years on.

Authors: Tanya Barrett; Dennis B Troup; Stephen E Wilhite; Pierre Ledoux; Carlos Evangelista; Irene F Kim; Maxim Tomashevsky; Kimberly A Marshall; Katherine H Phillippy; Patti M Sherman; Rolf N Muertter; Michelle Holko; Oluwabukunmi Ayanbule; Andrey Yefanov; Alexandra Soboleva
Journal: Nucleic Acids Res Date: 2010-11-21 Impact factor: 16.971

4. MiRNA-miRNA synergistic network: construction via co-regulating functional modules and disease miRNA topological features.

Authors: Juan Xu; Chuan-Xing Li; Yong-Sheng Li; Jun-Ying Lv; Ye Ma; Ting-Ting Shao; Liang-De Xu; Ying-Ying Wang; Lei Du; Yun-Peng Zhang; Wei Jiang; Chun-Quan Li; Yun Xiao; Xia Li
Journal: Nucleic Acids Res Date: 2010-10-06 Impact factor: 16.971

5. HOCTAR database: a unique resource for microRNA target prediction.

Authors: Vincenzo Alessandro Gennarino; Marco Sardiello; Margherita Mutarelli; Gopuraja Dharmalingam; Vincenza Maselli; Giampiero Lago; Sandro Banfi
Journal: Gene Date: 2011-03-22 Impact factor: 3.688

6. Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species.

Authors: Paul J Kersey; Daniel M Staines; Daniel Lawson; Eugene Kulesha; Paul Derwent; Jay C Humphrey; Daniel S T Hughes; Stephan Keenan; Arnaud Kerhornou; Gautier Koscielny; Nicholas Langridge; Mark D McDowall; Karine Megy; Uma Maheswari; Michael Nuhn; Michael Paulini; Helder Pedro; Iliana Toneva; Derek Wilson; Andrew Yates; Ewan Birney
Journal: Nucleic Acids Res Date: 2011-11-08 Impact factor: 16.971

7. miRecords: an integrated resource for microRNA-target interactions.

Authors: Feifei Xiao; Zhixiang Zuo; Guoshuai Cai; Shuli Kang; Xiaolian Gao; Tongbin Li
Journal: Nucleic Acids Res Date: 2008-11-07 Impact factor: 16.971

8. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors: Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal: Bioinformatics Date: 2009-11-11 Impact factor: 6.937

9. TopHat: discovering splice junctions with RNA-Seq.

Authors: Cole Trapnell; Lior Pachter; Steven L Salzberg
Journal: Bioinformatics Date: 2009-03-16 Impact factor: 6.937

10. The microRNA.org resource: targets and expression.

Authors: Doron Betel; Manda Wilson; Aaron Gabow; Debora S Marks; Chris Sander
Journal: Nucleic Acids Res Date: 2007-12-23 Impact factor: 16.971

65 in total

1. Decreased expression of microRNAs targeting type-2 diabetes susceptibility genes in peripheral blood of patients and predisposed individuals.

Authors: Ioanna Kokkinopoulou; Eirini Maratou; Panayota Mitrou; Eleni Boutati; Diamantis C Sideris; Emmanuel G Fragoulis; Maria-Ioanna Christodoulou
Journal: Endocrine Date: 2019-09-26 Impact factor: 3.633

2. Comprehensive analysis of human small RNA sequencing data provides insights into expression profiles and miRNA editing.

Authors: Jing Gong; Yuliang Wu; Xiantong Zhang; Yifang Liao; Vusumuzi Leroy Sibanda; Wei Liu; An-Yuan Guo
Journal: RNA Biol Date: 2014 Impact factor: 4.652

3. FARNA: knowledgebase of inferred functions of non-coding RNA transcripts.

Authors: Tanvir Alam; Mahmut Uludag; Magbubah Essack; Adil Salhi; Haitham Ashoor; John B Hanks; Craig Kapfer; Katsuhiko Mineta; Takashi Gojobori; Vladimir B Bajic
Journal: Nucleic Acids Res Date: 2017-03-17 Impact factor: 16.971

Review 4. miRNAs target databases: developmental methods and target identification techniques with functional annotations.

Authors: Nagendra Kumar Singh
Journal: Cell Mol Life Sci Date: 2017-02-15 Impact factor: 9.261

5. MicroRNA-411 and Its 5'-IsomiR Have Distinct Targets and Functions and Are Differentially Regulated in the Vasculature under Ischemia.

Authors: Reginald V C T van der Kwast; Tamar Woudenberg; Paul H A Quax; A Yaël Nossent
Journal: Mol Ther Date: 2019-10-07 Impact factor: 11.454

6. An expanded landscape of human long noncoding RNA.

Authors: Shuai Jiang; Si-Jin Cheng; Li-Chen Ren; Qian Wang; Yu-Jian Kang; Yang Ding; Mei Hou; Xiao-Xu Yang; Yuan Lin; Nan Liang; Ge Gao
Journal: Nucleic Acids Res Date: 2019-09-05 Impact factor: 16.971

7. Cadherin-6 is a putative tumor suppressor and target of epigenetically dysregulated miR-429 in cholangiocarcinoma.

Authors: Benjamin Goeppert; Christina Ernst; Constance Baer; Stephanie Roessler; Marcus Renner; Arianeb Mehrabi; Mohammadreza Hafezi; Anita Pathil; Arne Warth; Albrecht Stenzinger; Wilko Weichert; Marion Bähr; Rainer Will; Peter Schirmacher; Christoph Plass; Dieter Weichenhan
Journal: Epigenetics Date: 2016-09-03 Impact factor: 4.528

8. Identification of lncRNA-associated competing triplets reveals global patterns and prognostic markers for cancer.

Authors: Peng Wang; Shangwei Ning; Yunpeng Zhang; Ronghong Li; Jingrun Ye; Zuxianglan Zhao; Hui Zhi; Tingting Wang; Zheng Guo; Xia Li
Journal: Nucleic Acids Res Date: 2015-03-23 Impact factor: 16.971

9. CrossHub: a tool for multi-way analysis of The Cancer Genome Atlas (TCGA) in the context of gene expression regulation mechanisms.

Authors: George S Krasnov; Alexey A Dmitriev; Nataliya V Melnikova; Andrew R Zaretsky; Tatiana V Nasedkina; Alexander S Zasedatelev; Vera N Senchenko; Anna V Kudryavtseva
Journal: Nucleic Acids Res Date: 2016-01-14 Impact factor: 16.971

Review 10. Animal Models to Study MicroRNA Function.

Authors: Arpita S Pal; Andrea L Kasinski
Journal: Adv Cancer Res Date: 2017-08-08 Impact factor: 6.242