| Literature DB >> 34469540 |
Ioannis Kavakiotis1, Athanasios Alexiou1,2, Spyros Tastsoglou1,2, Ioannis S Vlachos3,4,5, Artemis G Hatzigeorgiou1,2.
Abstract
microRNAs (miRNAs) are short (∼23nt) single-stranded non-coding RNAs that act as potent post-transcriptional gene expression regulators. Information about miRNA expression and distribution across cell types and tissues is crucial to the understanding of their function and for their translational use as biomarkers or therapeutic targets. DIANA-miTED is the most comprehensive and systematic collection of miRNA expression values derived from the analysis of 15 183 raw human small RNA-Seq (sRNA-Seq) datasets from the Sequence Read Archive (SRA) and The Cancer Genome Atlas (TCGA). Metadata quality maximizes the utility of expression atlases, therefore we manually curated SRA and TCGA-derived information to deliver a comprehensive and standardized set, incorporating in total 199 tissues, 82 anatomical sublocations, 267 cell lines and 261 diseases. miTED offers rich instant visualizations of the expression and sample distributions of requested data across variables, as well as study-wide diagrams and graphs enabling efficient content exploration. Queries also generate links towards state-of-the-art miRNA functional resources, deeming miTED an ideal starting point for expression retrieval, exploration, comparison, and downstream analysis, without requiring bioinformatics support or expertise. DIANA-miTED is freely available at http://www.microrna.gr/mited.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34469540 PMCID: PMC8728140 DOI: 10.1093/nar/gkab733
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Comparison of DIANA-miTED against existing resources cataloguing miRNA expression. Numbers of included datasets and their fold-difference compared to miTED, available expression units, version of miRBase and miRNA-relevant functionalities of each resource are presented. NA values correspond to cases in which accessing the resource or performing queries failed
| Resource | Included datasets | Fold-increase in miTED | Expression units | miRBase version | miRNA-related functionalities | Links to tools and resources |
|---|---|---|---|---|---|---|
|
| 304 | ∼50 | RPM | v21 | Search by single/multiple miRNAs. Bulk data download. | miRBase, NCBI-SRA |
|
| 401 | ∼38 | RPM | v20 | NA | NA |
|
| 802 | ∼19 | Counts, RPM | v19, v21 | Search by single miRNA, coordinates, or sequence. Tissue specificity and read coverage information. Bulk data download. | UCSC Genome Browser |
|
| ∼11 000 | ∼1.38 | NA | v21 | NA | NA |
|
| 4258 | ∼4.2 | RPM | v21 | Search by single/multiple miRNAs. Differential expression. Gene targets (miRTarBase). Disease associations. Result download. | miRBase, GeneCards, GEO, PubMed, Disease Ontology |
|
| ∼11 500 | ∼1.31 | RPM, log2(RPM)-mean | v22 | Precursor-level miRNA expression (collected RPM values). Expression heatmaps. Query by dataset selection. Bulk data download. | UCSC Genome Browser, PubMed |
|
| 15 183 | - | Counts, RPM, log2(RPM) | v22 | Search by single/multiple miRNAs and/or single/multiple tissues or cell lines. Filter disease/healthy. Search for top expressed miRNAs. Search for top sites by miRNA abundance. Result download. | DIANA-tools (microT-CDS, TarBase, LncBase, miRPath), miRBase |
Figure 1.DIANA-miTED development workflow. Initially, human raw sRNA-Seq datasets were retrieved from NCBI-SRA and TCGA (alignment files obtained from TCGA and converted back to FASTQ format). Raw datasets were uniformly subjected to pre-processing and quality control, alignment and quantification. Read count, RPM and log2(RPM) values were calculated for miRBase miRNAs. Metadata from both resources were curated manually to create a comprehensive, standardized set of metadata annotations for the analyzed datasets. DIANA-miTED resource was developed utilizing MongoDB (noSQL database), PHP/Laravel for data access layer development, and Typescript/Angular for the application layer development. miTED features extensive query, filtering and visualization options and supports local retrieval of requested data.
Figure 2.Multi-query page interface. (A) Submission form. Users may search for one or more of 2656 miRNAs (1) and Tissues (2). Both miRNAs and Tissue query boxes support free text search. Through filtering options, users may restrict their query to specific Diseases (3), Collections (4) and Health status (5). Via the Expression value drop-down menu (6), users can choose the desired expression unit that will be returned (read counts, RPM, log2(RPM)). (B) Results table. All entries compliant to the applied criteria are returned, along with their metadata and the expression values of the selected miRNAs. The results list can be customized to show 20, 50, 100, 150 and 200 items per page. A useful word-based filter, Filter-down results (7), has been implemented to narrow-down the returned entries and focus on these that contain a very specific term of interest. Users can retrieve the results of their query in tab-delimited format by clicking on the Download data button (8), without the need for any sign-up, application, or verification procedure. (C) Interactive boxplot showing the miRNA abundance distribution per tissue/organ. Users can select (9) which miRNAs are visible in the diagram offering direct comparison among miRNAs. On hover, (10) boxplots reveal the corresponding boxplot statistics (minimum, maximum, median, lower fence, first quartile and third quartile). (D) Interactive Sankey diagrams enable visual inspection of the tissue-disease relationships of the query results. On hover, (11) users may explore in more detail the distribution of samples. (E) Pie charts offer visual representation of the distribution of samples across Gender (12), Health state (13) and Collection (14) variables.
Figure 3.Visualizations. (A) Interactive graph network relating tissues—organs and tissue subregions in samples included in miTED. Users may explore the graph and highlight nodes of interest, revealing most/least populated tissues and organs. (B) Interactive Sankey diagrams depicting relationships between ‘Tissue or organ of origin’ and Disease and ‘Tissue or organ of origin’ and Gender. On hover, users can explore the distribution of samples per category.