| Literature DB >> 31251108 |
Xiangfu Zhong1, Fatima Heinicke1, Simon Rayner1.
Abstract
microRNAs are small non-coding RNA molecules playing a central role in gene regulation. miRBase is the standard reference source for analysis and interpretation of experimental studies. However, the richness and complexity of the annotation is often underappreciated by users. Moreover, even for experienced users, the size of the resource can make it difficult to explore annotation to determine features such as species coverage, the impact of specific characteristics and changes between successive releases. A further consideration is that each new miRBase release contains entries that have had limited review and which may subsequently be removed in a future release to ensure the quality of annotation. To aid the miRBase user, we developed a software tool, miRBaseMiner, for investigating miRBase annotation and generating custom annotation sets. We apply the tool to characterize each release from v9.2 to v22 to examine how annotation has changed across releases and highlight some of the annotation features that users should keep in mind when using for miRBase for data analysis. These include: (1) entries with identical or very similar sequences; (2) entries with multiple annotated genome locations; (3) hairpin precursor entries with extremely low-estimated minimum free energy; (4) entries possessing reverse complementary; (5) entries with 3' poly(A) ends. As each of these factors can impact the identification of dysregulated features and subsequent clinical or biological conclusions, miRBaseMiner is a valuable resource for any user using miRBase as a reference source.Entities:
Keywords: Microrna; NGS; annotation; characterization; miRBase; miRBaseMiner; miRNA
Mesh:
Substances:
Year: 2019 PMID: 31251108 PMCID: PMC6779376 DOI: 10.1080/15476286.2019.1637680
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652
Figure 1.(a) Schematic of miRNA nomenclature used in miRBase release 22. Each entry contains the following fields delimited with a hyphen: (1) three to seven letters indicating species; (2) miR/mir indicating miRNA and miRNA hairpin precursor, respectively; (3) numeric suffix that is assigned sequentially to new entries (in plant miRNAs there is no hyphen delimiter between this field and the species field). Additional letters indicate mature miRNA sequence is shared by entries with the same numeric suffix, and additional numbers indicate miRNA is generated from different hairpin precursors; (4) 3p/5p indicates from which arm of the hairpin precursor the miRNA was generated. (b) Overview of content in successive releases of miRBase (Y-axis corresponds to miRBase releases from 9.2 to 22). Left: Bar plot showing number of annotated species (X-axis) in each miRBase release. Middle: heat map of number of miRNA entries for the 26 species with more than 500 entries in miRBase v22. X-axis corresponds to the 26 species, ordered by total number of miRNAs for each species. Only a few species contain a large number of miRNAs (in red); Right: Bar plot showing total number of miRNAs entries summed over all species (X-Axis). (c) Sequence length distribution of miRNAs from the 26 species in (b). The average miRNA length is 21 ~ 22 nucleotides but many entries are shorter or longer than this.
Example of differences in miRNA length and read length filtering used in the miRNA studies.
| miRNA description | Read length filtering |
|---|---|
| ~22 nt[ | 19-20 nt[ |
| ~23 nt[ | ≤ 25 nt[ |
| 21-23 nt[ | 19-26 nt[ |
| ~19- 24 nt[ | 16-30 nt[ |
| 21 -24 nt[ | 16-27 nt[ |
| ~20-22 nt[ | > 17 nt[ |
| 17-23 nt[ | |
| 18-24 nt[ | |
| 20-24 nt[ | |
| 19-22 nt[ |
Figure 2.(a) Bar plot showing the number of updated miRNAs (right column) and pre-miRNA (left column) entries that were updated between subsequent versions of miRBase from release 9.2 to 22. The rows correspond to four categories: NEW, NAME, SEQUENCE and DELETE. In each plot, X-axis: miRBase versions in chronological order. Y-axis: the number of updated miRNAs/hairpin precursors. Red corresponds to human data and green for mouse. (b) Five types of sequence changes that occur between successive miRBase releases. First three examples are changes between release 21 and 22. Last two examples are changes between miRBase version 17 and 18. Nucleotides changes in sequences between two releases are marked in blue. The bottom sequence is the original miRNA sequence in the previous version of miRBase, the top sequence is in the newer release. The number after sequence box represents the frequency of corresponding miRNA sequence changes that occurred in miRBase 22.
Figure 3.The presence of identical sequences in human and mouse miRNA entries from miRBase version 9.2 to 22 (A, B), and sequence similarity in miRBase 22 (C, D). (a) Left: human miRNAs; Right: mouse miRNAs. In miRBase 9.2, there are no miRNAs in human or mouse sharing identical sequence with other entries. The colour denotes the number of miRNAs annotated with that sequence, from yellow to red indicating increasing number. The number in each cell represents the number of miRNA entries with the sequence in that row and the miRBase entry (corresponding to that column). X-axis indicates the miRBase version; Y-axis indicates the duplicated miRNA sequence. (B) Examples of miRNAs sharing identical sequence. The number in green hexagon refers to the corresponding row in (A). The text in red upper case indicates the type of annotation change; NEW: newly added miRNA; DELETE: miRNA entry deleted from miRBase. The text above arrows indicates miRBase versions in which that change occurred. Right-hand plots. The similarity network of human and mouse miRNAs and hairpin precursors in miRBase version 22 based on the pairwise Levenshtein distance matrix for Levenshtein distances less than three nucleotides. (C) human miRNA and hairpin precursor network; (D) mouse miRNA and pre-miRNA network. Each dot represents a miRNA or hairpin precursor. Blue edge: two similar miRNAs; red edge: pre-miRNA and its respective miRNA; green edge: two similar hairpin precursors. The darker color (blue/green) corresponds to a Levenshtein distance equal to 0, the lighter colour corresponds to larger Levenshtein distances.
Comparison between the content of the full set and the high confidence set of human entries for miRBase release 22.
| Issues | Full miRBase entries | High_confidence miRBase entries |
|---|---|---|
| species representation in miRNAs | hsa: 2656/48,885 (5.43%) | hsa: 1198/3982 (30.09%) |
| species representation in precursors | hsa: 1917/38,589 (4.97%) | hsa: 658/2162 (30.43%) |
| miRNAs with identical sequence | hsa: 40/2656 (1.51%) | hsa: 0/1198 (0%) |
| Reverse complementary miRNAs | hsa: 20/2656 (0.75%) | hsa: 12/1198 (1.00%) |
| Reverse complementary pre-miRNAs | hsa: 24/1917 (1.25%) | hsa: 15/658 (2.28%) |
| miRNAs overlapping (including identical) | hsa: 71/2656 (2.67%) | hsa: 17/1198 (1.42%) |
| miRNAs with poly(A) tail (AAAA) | hsa: 4/2656 (0.15%) | hsa: 1/1198 (0.08%) |
| miRNAs (Levenshtein distance ≤ 3) | hsa: 348/2656 (13.10%) | hsa: 218/1198 (18.20%) |
| Pre-miRNAs (Levenshtein distance ≤ 3) | hsa: 111/1917 (5.79%) | hsa: 39/658 (5.93%) |
| Hairpins are missing genome coordinates | hsa: 4/1917 (0.21%) | hsa: 0/1198 (0%) |
Figure 4.The workflow of miRBaseMiner.