Literature DB >> 31598695

MirGeneDB 2.0: the metazoan microRNA complement.

Bastian Fromm1,2, Diana Domanska3,4, Eirik Høye2,5, Vladimir Ovchinnikov6,7, Wenjing Kang1, Ernesto Aparicio-Puerta8, Morten Johansen3, Kjersti Flatmark2,5,9, Anthony Mathelier10,11, Eivind Hovig2,3, Michael Hackenberg8, Marc R Friedländer1, Kevin J Peterson12.   

Abstract

Small non-coding RNAs have gained substantial attention due to their roles in animal development and human disorders. Among them, microRNAs are special because individual gene sequences are conserved across the animal kingdom. In addition, unique and mechanistically well understood features can clearly distinguish bona fide miRNAs from the myriad other small RNAs generated by cells. However, making this distinction is not a common practice and, thus, not surprisingly, the heterogeneous quality of available miRNA complements has become a major concern in microRNA research. We addressed this by extensively expanding our curated microRNA gene database - MirGeneDB - to 45 organisms, encompassing a wide phylogenetic swath of animal evolution. By consistently annotating and naming 10,899 microRNA genes in these organisms, we show that previous microRNA annotations contained not only many false positives, but surprisingly lacked >2000 bona fide microRNAs. Indeed, curated microRNA complements of closely related organisms are very similar and can be used to reconstruct ancestral miRNA repertoires. MirGeneDB represents a robust platform for microRNA-based research, providing deeper and more significant insights into the biology and evolution of miRNAs as well as biomedical and biomarker research. MirGeneDB is publicly and freely available at http://mirgenedb.org/.
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 31598695      PMCID: PMC6943042          DOI: 10.1093/nar/gkz885

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

In the last two decades, the small non-coding RNA field has significantly expanded beyond such well known small RNAs as transfer RNAs (tRNAs), small nuclear RNAs (snRNAs) and small nucleolar RNAs (snoRNA) (1) to include small interfering RNAs (siRNAs) (2), Piwi-interacting RNAs (piRNAs) (3), and, in particular microRNAs (miRNAs) (4–7). Although both tRNAs (8) and ribosomal RNAs (9) can generate small regulatory RNAs, miRNAs are characterized by a distinctive suite of sequence features, in addition to striking sequence conservation, not seen in other types of small RNAs (10–12). Unfortunately, recognition and utilization of these clear and mechanistically well understood features is not common practice (13–23) and has, for instance, led to extreme overestimations of the human microRNA complement (24–27). Because of the fundamental roles miRNAs play in establishing robustness of gene regulatory networks across Metazoa (28,29), and their importance in development (30), formation of cell identity (31) and numerous human diseases including cancer (32,33), it is imperative that homologous miRNAs in different species are correctly identified, annotated, and named using consistent criteria against the backdrop of numerous other types of coding and non-coding RNA fragments (23,34,35). Further, it is vital that bona fide miRNAs are clearly distinguished from non-miRNAs to avoid spurious conclusions (e.g. (36–38)) concerning the role small RNAs play in human disease. Nonetheless, these goals are largely ignored for existing databases, such as miRBase (39), which has developed organically through community-wide submissions of published miRNA calls, and miRCarta (40), a repository that aims to provide miRNA candidates from ultra-deep sequencing experiments in human. With respect to miRBase, several research groups have shown that up to two-thirds of the entries are false positives, with many entries being fragments of other classes of small RNAs including tRNAs and snoRNAs, in addition to numerous rRNA fragments (13–23). The interpretation of these non-miRNA fragments as bona fide miRNAs affects our understanding of not only how miRNAs evolve (41), but also incorrectly annotated bona fide miRNAs can lead to erroneous conclusions on miRNA biology (see, e.g. (42,43)). Inconsistencies in nomenclature and changes between miRBase releases have made it challenging to use miRBase throughout the years leading to numerous community efforts to both independently identify changes to miRBase (44–49) and to develop independent (see (50)) and study-specific databases (14,51–55). To address these concerns, we previously developed a manually curated and open source miRNA gene database, MirGeneDB, which is based on consistent annotation and nomenclature criteria (23). But because it contained only four species, the usefulness for comparative studies was severely limited. Here, we present a major update to our database, MirGeneDB version 2.0 (http://mirgenedb.org), which now contains high-quality annotations of 10 899 bona fide and consistently named miRNAs constituting 1275 miRNA families from 45 species, representing every major metazoan group, including many well-established and emerging invertebrate and vertebrate model organisms (Figure 1).
Figure 1.

The evolution of the 1275 microRNA families across the 45 metazoan species currently annotated in MirGeneDB. Conserved gains are shown in red; species-specific gains are shown in pink, and losses are shown in blue; and these gains and losses are mapped onto a generally accepted topology of these species rooted between the deuterostomes and protostomes with branch lengths corresponding to gains and losses, respectively. Note though that this topology is largely recovered when just analyzing the gains and losses of the miRNA families themselves as shown by the bootstrap values indicated at the nodes (Supplementary Methods); the only known nodes not recovered are nodes within the placental mammals and Ecdysozoa, primarily due to losses in rodents and nematodes, respectively.

The evolution of the 1275 microRNA families across the 45 metazoan species currently annotated in MirGeneDB. Conserved gains are shown in red; species-specific gains are shown in pink, and losses are shown in blue; and these gains and losses are mapped onto a generally accepted topology of these species rooted between the deuterostomes and protostomes with branch lengths corresponding to gains and losses, respectively. Note though that this topology is largely recovered when just analyzing the gains and losses of the miRNA families themselves as shown by the bootstrap values indicated at the nodes (Supplementary Methods); the only known nodes not recovered are nodes within the placental mammals and Ecdysozoa, primarily due to losses in rodents and nematodes, respectively.

EXPANSION OF MirGeneDB

For the expansion from version 1.0 to 2.0, we analyzed more than 400 publicly available smallRNA sequencing datasets with at least one representative dataset for each organism, that were automatically downloaded and processed using sRNAbench (56) and miRTrace (57), respectively. This allowed for a consistent and uniform annotation of miRNAomes for each species using MirMiner (11) (see Supplementary Table, ‘file_info’ for files and see Supplementary Information for detailed methods) (23). The existing MirGeneDB.org miRNA complements for human, mouse, chicken and zebrafish were expanded from our initial effort by 32, 54, 41 and 103 genes to a total of 556, 447, 270 and 390 genes, respectively (Supplementary Table, ‘table’), and annotation-accuracy for human and zebrafish was further improved using available Cap Analysis of Gene Expression (CAGE) data (Supplementary Figure S1, Supplementary Table, ‘CAGE’; Supplementary Information) (58). We further used Dicer-, Drosha- and Exportin 5-knockout data (59), as well as primary cell expression data (58,60–62) to refine human annotations. Although since its inception MirGeneDB gave special attention to the precise annotation of both the 5p and 3p arms (and thus allowing for better annotation of miRNA isoforms (63,64)), with a clear distinction made between sequenced reads and predicted reads for each miRNA entry, MirGeneDB 2.0 includes four new features related to the transcription and processing of miRNAs (Figure 2A). First, Group 2 miRNAs (65,66)—those miRNA precursor transcripts that are mono-uridylated at their 3′ end, what we term the 3′ non-templated uridine (3′NTU)—are specifically tabulated, allowing the user to easily discriminate Group 2 from ‘Group 1’ (or canonical) miRNAs. Second, sequence motifs, including the 5′ ‘UG’ motif, the loop ‘UGU/G’ motif, as well as the 3′ CNNC motif (67–69) are bioinformatically identified for every miRNA primary transcript. Third, processing variants, where alternative Drosha/Dicer cuts significantly (>10% of available reads) affect the processed mature seed sequence of the locus (see for example ref. (70)), are added as distinct entries (indicated with the ‘v’ in the name). Some loci, like the Mir-203 gene (Figure 2A) show both mono-uridylation and variant processing such that only one of the two major variants is classified as a Group 2 miRNA. Finally, we also annotate anti-sense loci (‘-as’) for miRNA genes where again significant expression (>10%) of both sense and antisense strands is observed (71).
Figure 2.

The annotation of microRNA sequences and the implementation of transcriptional and processing information for each miRNA gene in MirGeneDB. (A) The structure and read stacks for Hsa-Mir-203. The precursor sequence is shown in bold; mature reads are shown in red and star reads in blue with the reads per million for each major transcript detected shown to the far right. A ‘CNNC’ processing motif (68) is shown in yellow. Also shown are the 5′ and 3′ miRNA offset reads (magenta), which clearly conform to the indicated Drosha cut (staggered line, left) given the reads processed from this locus. The Dicer cut (staggered line, right) results in two primary mature forms (dark vs light red), what we term ‘variants’ (v) that are offset from one another by 1 nucleotide (gray). The 5′ end of variant one starts with the ‘G’ whereas the 5′ end of variant two is moved 1 nucleotide 3′ and starts with the ‘U.’ Each of these two major Dicer products is accompanied by the appropriate star sequence, with variant 1 shown in dark blue and variant 2 in light blue. The mature form of variant 2—but not version 1—is heavily mono-uridylated at its 3′ end (green circle) and is thus a ‘Group 2’ miRNA (59,66). (B) The quantification of Hsa-Mir-203 read across various human-specific data sets. As expected (e.g. (72)) expression in skin is about ∼2 orders of magnitude higher relative to other organs sampled (e.g., brain, liver, stomach, lung, uterus, pancreas, testes, colorectum, small intestine and kidney) and the detection of the mature form is nearly 3 orders of magnitude relative to the star. Consistent with Mir-203 being a bona fide miRNA, expression is nearly abrogated in DROSHA and DICER knock-outs, and greatly diminished in the EXPORTIN-5 knock-out (59).

The annotation of microRNA sequences and the implementation of transcriptional and processing information for each miRNA gene in MirGeneDB. (A) The structure and read stacks for Hsa-Mir-203. The precursor sequence is shown in bold; mature reads are shown in red and star reads in blue with the reads per million for each major transcript detected shown to the far right. A ‘CNNC’ processing motif (68) is shown in yellow. Also shown are the 5′ and 3′ miRNA offset reads (magenta), which clearly conform to the indicated Drosha cut (staggered line, left) given the reads processed from this locus. The Dicer cut (staggered line, right) results in two primary mature forms (dark vs light red), what we term ‘variants’ (v) that are offset from one another by 1 nucleotide (gray). The 5′ end of variant one starts with the ‘G’ whereas the 5′ end of variant two is moved 1 nucleotide 3′ and starts with the ‘U.’ Each of these two major Dicer products is accompanied by the appropriate star sequence, with variant 1 shown in dark blue and variant 2 in light blue. The mature form of variant 2—but not version 1—is heavily mono-uridylated at its 3′ end (green circle) and is thus a ‘Group 2’ miRNA (59,66). (B) The quantification of Hsa-Mir-203 read across various human-specific data sets. As expected (e.g. (72)) expression in skin is about ∼2 orders of magnitude higher relative to other organs sampled (e.g., brain, liver, stomach, lung, uterus, pancreas, testes, colorectum, small intestine and kidney) and the detection of the mature form is nearly 3 orders of magnitude relative to the star. Consistent with Mir-203 being a bona fide miRNA, expression is nearly abrogated in DROSHA and DICER knock-outs, and greatly diminished in the EXPORTIN-5 knock-out (59).

QUALITY OF MirGeneDB ANNOTATIONS

A robust database must be free of both false positive and false negative entries. MiRBase categorizes a subset of their entries as high-confidence miRNAs, which are those that are highly expressed and show clear indications of proper processing, and further has introduced a public voting system to identify more high-quality candidates (73). MirGeneDB takes an alternative approach: rather than allowing for community annotation, the near-complete miRNA repertoire of each taxon is added to MirGeneDB using a consistent and well-defined set of criteria (23,34,74). When comparing MirGeneDB 2.0 and miRBase, the number of miRNAs conforming to the annotation criteria is about three times higher in MirGeneDB than it is in miRBase (2844 for the miRBase ‘high confidence’ set (73)). Further, because the primary requirement for the inclusion of a putative miRNA to miRBase is publication in a peer-reviewed journal, over time, miRBase has become increasingly heterogeneous with respect to the number of miRNAs for closely related species, such as the often studied human and rarely studied macaque (75) (Figure 3). This focus on model systems—in particular human, mouse and chicken—has resulted in miRBase having, on the one hand, a much larger number of annotated sequences for some of the 38 taxa shared with MirGeneDB2.0, accounting for estimated 5631 false positives, and, on the other hand, miRBase lacking 19% of all MirGeneDB2.0 genes, accounting for 2097 false negatives (Supplementary Figure S2, Supplementary Table, ‘overview’). These disparities have obstructed comparative genomic approaches in the miRNA field: for example, missing miRNA families have been misinterpreted as secondary losses, questioning then the fundamental conservation of miRNA families (76). However, very similar miRNA complements in terms of total miRNA genes and miRNA families are observed in closely related groups in MirGeneDB (Figure 3), supporting earlier evolutionary studies arguing for the utility of miRNAs as excellent phylogenetic markers (11,41,57,74) (Figure 1).
Figure 3.

Metazoan miRNA complements are homogeneous between closely related species. Top: miRBase community-report based complements show high heterogeneity in the numbers of families (red) and genes (blue) for closely related species. For instance, in miRBase, human and macaque differ by 1300 genes (Hsa 1917, Mml 617) and 1081 families (Hsa: 1543, Mml: 462). Bottom: MirGeneDBs curated complements are homogeneous for both gene and family numbers (see Supplementary Figure S3 for conserved families, genes in comparison to novel families and genes). For instance, in MirGeneDB, human and macaque differ by 55 genes (Hsa 556, Mml 501) and only one conserved family (Hsa: 206, Mml: 205). Asterisks mark species that are found in MirGeneDB, but not in miRBase.

Metazoan miRNA complements are homogeneous between closely related species. Top: miRBase community-report based complements show high heterogeneity in the numbers of families (red) and genes (blue) for closely related species. For instance, in miRBase, human and macaque differ by 1300 genes (Hsa 1917, Mml 617) and 1081 families (Hsa: 1543, Mml: 462). Bottom: MirGeneDBs curated complements are homogeneous for both gene and family numbers (see Supplementary Figure S3 for conserved families, genes in comparison to novel families and genes). For instance, in MirGeneDB, human and macaque differ by 55 genes (Hsa 556, Mml 501) and only one conserved family (Hsa: 206, Mml: 205). Asterisks mark species that are found in MirGeneDB, but not in miRBase. Thus, while it is inevitable that some cell-type specific or lowly expressed miRNAs are missing from our annotations, MirGeneDB can be considered essentially free of false positives. Further, because MirGeneDB is focused on identification of miRNA genes and families, rather than sequences (23), a bona fide miRNA gene identified in one taxon is identified as such in all, in contrast to miRBase, where the same gene can be identified as generating a high-confidence miRNA sequence in one taxon, but a low-confidence sequence in another (23). Hence, we are confident that there are few (if any) missing miRNA genes that are conserved between two (or more) of the 45 currently included taxa.

IMPROVED WEB INTERFACE OF MirGeneDB

The expanded web-interface of MirGeneDB 2.0 allows browsing (http://mirgenedb.org/browse), searching (http://mirgenedb.org/search) and now also downloading (http://mirgenedb.org/download) of miRNA-complements for each organism, in addition to a general information page about the criteria used for miRNA annotation (http://mirgenedb.org/information), as well as false negatives for each taxon (where known), and links to previous versions of MirGeneDB. On the browse-pages for each organism (e.g. http://mirgenedb.org/browse/hsa), a table is available that includes MirGeneDB IDs and miRBase IDs (if available), family- and seed-assignment and the strandedness of the miRNA (i.e. whether the mature arm is the 5p arm, the 3p arm, or both) (Figure 4, ‘A’); overview information on location in the genome (Figure 4, ‘B’); and the phylogenetic origin of each miRNA locus and family (Figure 4, ‘C’). The new features in MirGeneDB 2.0, including the 3′ NTU’s and sequence motifs (see Figure 2) are also indicated (Figure 4, ‘D’). Finally, a heatmap of the expression of each miRNA for all available tissues is available to orient users on expression patterns (Figure 4, ‘E’).
Figure 4.

Improved web interface of MirGeneDB. For each species in MirGeneDB an overview browse page exists that lists all genes. For each gene the following information is provided and sortable: hyperlinked names (both MirGeneDB ID and miRBase ID linking to MirGeneDB and miRBase, respectively), family- and seed- assignments, and arm preference (A), genomic coordinates (B); inferred phylogenetic origin of both the gene locus and family (C); information on the presence or absence of 3′ NTU’s and sequence motifs (D); and a normalized heatmap for available datasets (E).

Improved web interface of MirGeneDB. For each species in MirGeneDB an overview browse page exists that lists all genes. For each gene the following information is provided and sortable: hyperlinked names (both MirGeneDB ID and miRBase ID linking to MirGeneDB and miRBase, respectively), family- and seed- assignments, and arm preference (A), genomic coordinates (B); inferred phylogenetic origin of both the gene locus and family (C); information on the presence or absence of 3′ NTU’s and sequence motifs (D); and a normalized heatmap for available datasets (E). From here, gene-pages for each miRNA gene can be opened (e.g. http://mirgenedb.org/show/hsa/Let-7-P1a) that contain names, family and seed, orthologues and paralogues, sequences, such as the mature seeds, structure, and a range of other information, including genomic coordinates with hyperlinks to UCSC or ENSEMBLs genome browsers when available. Further, interactive read-pages are also provided for each gene (e.g., http://mirgenedb.org/static/graph/hsa/results/Hsa-Let-7-P1a.html) that show an overview of read-stacks on the corresponding extended precursor sequence of each gene-page. These pages contain detailed representations of templated and 3′-end non-templated reads for individual datasets for each gene, including reports on miRNA isoforms and downloadable read-mappings, and the information can be used to quantify expression of any miRNA across known data sets (e.g. Figure 2B). For the miRNA repertoire of each species, the members of all miRNA families, or for each miRNA entry, we provide sub-annotations of the precursor, mature, loop, co-mature, star and seed sequences. In addition, we also provide 30-nucleotide flanking regions on both arms for each miRNA to generate an extended precursor transcript for the discovery of regulatory sequence motifs, and lastly separate annotations of seed sequences. On the search-pages, these annotations can be searched independently, either by sequence using Blast (77), the MirGeneDB name, or, if existing, by the full miRBase name (78). Users can also search specific 7-nt seed sequences, and all searches can be done either for individual species or over the entire database. Finally, on the download-pages, fasta, gff, or bed-files for all miRNA components are downloadable for each species.

MiRNA NOMENCLATURE

Following Ambros et al. (34), MirGeneDB 2.0 employs an internally consistent nomenclature system where genes of common descent are assigned the same miRNA family name, allowing for the easy recognition of both orthologues in other species, and paralogues within the same species, as described earlier (23). The advantages—and limitations—of the nomenclature system employed by MirGeneDB are exemplified by the LET-7 family of miRNAs (Figure 5). Let-7 is an ancient miRNA gene evolving sometime after the bilaterian split from cnidarians, but before the divergence between protostomes and deuterostomes (79,80), and was (and, in many taxa, still is) syntenically linked to two MIR-10 family members, mir-99/100 and mir-125). However, before the last common ancestor of urochordates and vertebrates (collectively called the Olfactores, (81)), this original gene duplicated, generating two paralogues, one still linked to the two MIR-10 genes (paralogue 1, light gray box), and a second, now located elsewhere in the genome (paralogue 2, dark gray box) (82), that is mono-uridylated at the 3′ end (66). This second paralogue likely duplicated several times before the divergence between urochordates and vertebrates, and then, early in vertebrate evolution, the entire genome duplicated twice. Thus, the last common ancestor of gnathostomes had three clusters of P1 with one Let-7 gene and four clusters of P2, each consisting of 2–3 Let-7 genes (Figure 5). This was followed by breakage of some of the clusters and the loss of the fourth cluster in some taxa, in particular therian mammals. Although both the urochordate and the vertebrates have multiple linked P2 Let-7 genes, none of these genes can be directly orthologized with any of the five P2 genes in urochordates, and thus these five genes are called ‘orphans’ (23) in the urochordate to highlight this fact. However, if new information comes to light that will allow for robust phylogenetic insight, these names would be changed accordingly.
Figure 5.

Nomenclature comparison between MirGeneDB and miRBase for representative chordate Let-7s. Shown is the accepted topology (81) for the three major subgroups of chordates, and for each taxon, a (unscaled) representation of the genomic organization of its Let-7 genes/sequences. MirGeneDB names are shown below each of the loci symbols, and the miRBase sequence names are above. The primitive condition is to possess a single Let-7 gene linked to the two Mir-10 genes (light gray box), as is still found in many bilaterian taxa. In the amphioxus Branchiostoma floridae, this single Let-7 duplicated, and this new paralogue is now positioned at the 3′ end of the cluster. In the Olfactores there is a separate gene duplication event generating another paralogue that is not linked to the original Let-7 cluster in any known urochordate, like Ciona intestinalis, or any vertebrate, including human (H. sapiens) and the platypus (O. anatinus). Further distinguishing this paralogue is that in all Olfactores these Let-7 genes (shown in the dark gray boxes) are Group 2 miRNAs, each with an untemplated mono-uridylated 3′ end (green circles) (see (66)). False negatives (i.e. loci present and transcribed that are present in MirGeneDB, but not in miRBase) are shown in blue. A single false positive (i.e. a sequence present in miRBase—cin-let-7e—but without a corresponding locus in the genome) is shown in red. Note that let-7e also names two sequences derived from two non-orthologous genes in human and platypus—a canonical Group 1 Let-7 (Let-7-P1b) in human, but a Group 2 miRNA (Let-7-P2a4) in platypus. This locus is also present in diapsids (birds and ‘reptiles’), as well as in the teleost fish Danio rerio, but is lost in therian (i.e. placental and marsupial) mammals (see also (82)). Despite the fact that the monophyly of these Group 2 Let-7s in Olfactores appears robust, how the ancestral cluster of the three Let-7-P2s in vertebrates is related to the five linked P2 genes in C. intestinalis remains unknown. Hence, MirGeneDB identifies these genes with this phylogenetic opacity in mind.

Nomenclature comparison between MirGeneDB and miRBase for representative chordate Let-7s. Shown is the accepted topology (81) for the three major subgroups of chordates, and for each taxon, a (unscaled) representation of the genomic organization of its Let-7 genes/sequences. MirGeneDB names are shown below each of the loci symbols, and the miRBase sequence names are above. The primitive condition is to possess a single Let-7 gene linked to the two Mir-10 genes (light gray box), as is still found in many bilaterian taxa. In the amphioxus Branchiostoma floridae, this single Let-7 duplicated, and this new paralogue is now positioned at the 3′ end of the cluster. In the Olfactores there is a separate gene duplication event generating another paralogue that is not linked to the original Let-7 cluster in any known urochordate, like Ciona intestinalis, or any vertebrate, including human (H. sapiens) and the platypus (O. anatinus). Further distinguishing this paralogue is that in all Olfactores these Let-7 genes (shown in the dark gray boxes) are Group 2 miRNAs, each with an untemplated mono-uridylated 3′ end (green circles) (see (66)). False negatives (i.e. loci present and transcribed that are present in MirGeneDB, but not in miRBase) are shown in blue. A single false positive (i.e. a sequence present in miRBase—cin-let-7e—but without a corresponding locus in the genome) is shown in red. Note that let-7e also names two sequences derived from two non-orthologous genes in human and platypus—a canonical Group 1 Let-7 (Let-7-P1b) in human, but a Group 2 miRNA (Let-7-P2a4) in platypus. This locus is also present in diapsids (birds and ‘reptiles’), as well as in the teleost fish Danio rerio, but is lost in therian (i.e. placental and marsupial) mammals (see also (82)). Despite the fact that the monophyly of these Group 2 Let-7s in Olfactores appears robust, how the ancestral cluster of the three Let-7-P2s in vertebrates is related to the five linked P2 genes in C. intestinalis remains unknown. Hence, MirGeneDB identifies these genes with this phylogenetic opacity in mind. The nomenclature system employed by MirGeneDB has several distinct advantages. First, non-orthologous genes are never given the same name. For example, both human and platypus have let-7e sequences, but let-7e in human is derived from the ancestral P1 gene, is linked to MIR-10 genes, and is a Group 1 miRNA; let-7e in platypus is derived from the ancestral P2 gene, is not linked to MIR-10 genes, is mono-uridylated at its 3′ terminus, and maybe most importantly is a gene lost in all therian (i.e. placental and marsupial) mammals (Figure 5). Second, simply from the name, one can get an accurate picture of the evolutionary history of the gene within the context of a monophyletic miRNA family (23). For example, there are two Let-7 genes in the amphioxus Branchiostoma floridae, a close chordate relative to urochordates and vertebrates, that are amphioxus-specific gene duplicates of the MIR-10 associated Let-7 gene. Although they are called let-7a-1 and let-7a-2, the same names employed by two human miRNAs, they are in fact amphioxus-specific gene duplicates of the MIR-10 associated Let-7 gene. MirGeneDB then necessarily identifies them accordingly, naming them Bfl-Let-7-P3 and Bfl-Let-7-P4 to distinguish these unique paralogues from the two Let-7 paralogues (P1 and P2) of Olfactores (Figure 5). A third advantage to this system is that misnamed genes will not be orphaned in literature searches or functional studies. For example, one of the 12 human Let-7 paralogue was originally named mir-98 (see Figure 5), and although miRBase lists this gene correctly within the LET-7 family, it is not obvious from the name itself. Notably, in the latest release, applying a novel text mining approach for literature searches, the miRBase authors state that there are only 11 Let-7 family members in human, failing to account for Mir-98 (39). This example clearly highlights the importance of consistent naming and the risks of non-uniform nomenclature systems. Finally, because MirGeneDB uses this natural classification and nomenclature system, it allows for an accurate reconstruction of ancestral miRNA repertoires—both at the family-level and at the gene­-level—that is now provided in MirGeneDB 2.0 for all nodes leading to the 45 terminal taxa considered. This allows users to easily assess both gains and losses of miRNA genes and families through time. Again, with respect to the LET-7 family, it is clear that therians lost two ancestral Let-7 genes, genes that are still retained in platypus (Let-7-P2a4 and -Pb4, see Figure 5) and were present in their last common ancestor.

FUNCTIONAL CLASSIFICATION OF MiRNA-SEEDS

The binding and repression of miRNA targets is primarily mediated by the reverse complementarity between the miRNA seed (nucleotide positions 2–8 of the mature miRNA) and the corresponding target region (83,84). Although highly conserved across vast distances of geologic time, seed sequences can and do change (11,23), expanding the functional repertoire of an ancestral seed sequence. Further, because there are only 16,384 (47) possible seed sequences, sequence space is highly limited necessitating the inevitability of convergence in two evolutionary independent miRNA families. For example, in Caenorhabditis briggsae, there are four LET-7 paralogues (http://mirgenedb.org/browse/cbr?family=LET-7) that all share the seed ‘GAGGUAG’. Interestingly, however, when listing all miRNAs with this seed (http://mirgenedb.org/browse/cbr?seed=GAGGUAG) 8 genes from C. briggsae are listed including four paralogues of the MIR-7594 family. These genes though house the mature sequence - and hence the seed sequence - on the 3p arm, as opposed to the 5p arm as found in LET-7 genes, and thus are a clear case of evolutionary convergence. Nonetheless, because there might be some interesting functional overlap between the Let-7 and Mir-7594 sequences MirGeneDB also now has a ‘seed’ category for each miRNA that summarizes all miRNA entries with the exact same seed sequence (Figure 4, ‘A’). This interface allows the user to find all miRNAs with identical seed within a given species, or among all MirGeneDB organisms in both orthologous and non-orthologous genes. Further, different seeds in similarly named genes allows the user to easily recognize divergence of the seed sequence itself. Finally, a search function is provided that allows the user to search for any known seed sequence.

FUTURE DEVELOPMENTS

The establishment of this carefully curated database of miRNA genes, supplementing existing databases, including miRBase and miRCarta, represents a stable and robust foundation for reproducible miRNA research, in particular studies that rely on cross-species comparisons to explore the roles miRNAs play in development and disease, as well as the evolution of miRNAs and animals themselves. Our long-term goal is to have a wider representation of metazoan species, and for each of these organisms a large number of comparable datasets for a comprehensive set of organs, tissues and cell types. We hasten to stress that although ∼11 000 genes currently in MirGeneDB have been hand curated, mistakes are inevitable, both in terms of the inclusion of species-specific false positives, missing false negatives, as well as processing errors, mistakes in understanding evolutionary history (possibly resulting in nomenclature errors), and other factors. We would ask the community to alert us to any such errors as only through community-wide collaboration can these inevitable mistakes be eliminated from the database, and MirGeneDB promises to resolve any errors in a timely fashion.

DATA AVAILABILITY

All MirGeneDB data are publicly and freely available under the Creative Commons Zero license. Data are available for bulk download from http://mirgenedb.org/download. All previous versions of MirGeneDB can be found under the Information tab (http://mirgenedb.org/information). Feedback on any aspect of the MirGeneDB database is welcome by email to BastianFromm@gmail.com or Kevin.J.Peterson@dartmouth.edu, or via Twitter (@MirGeneDB). Click here for additional data file.
  83 in total

1.  A skin microRNA promotes differentiation by repressing 'stemness'.

Authors:  Rui Yi; Matthew N Poy; Markus Stoffel; Elaine Fuchs
Journal:  Nature       Date:  2008-03-02       Impact factor: 49.962

2.  Evolutionary history of plant microRNAs.

Authors:  Richard S Taylor; James E Tarver; Simon J Hiscock; Philip C J Donoghue
Journal:  Trends Plant Sci       Date:  2014-01-07       Impact factor: 18.313

Review 3.  Metazoan MicroRNAs.

Authors:  David P Bartel
Journal:  Cell       Date:  2018-03-22       Impact factor: 41.582

4.  Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs.

Authors:  Lee P Lim; Nelson C Lau; Philip Garrett-Engele; Andrew Grimson; Janell M Schelter; John Castle; David P Bartel; Peter S Linsley; Jason M Johnson
Journal:  Nature       Date:  2005-01-30       Impact factor: 49.962

5.  Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA.

Authors:  A E Pasquinelli; B J Reinhart; F Slack; M Q Martindale; M I Kuroda; B Maller; D C Hayward; E E Ball; B Degnan; P Müller; J Spring; A Srinivasan; M Fishman; J Finnerty; J Corbo; M Levine; P Leahy; E Davidson; G Ruvkun
Journal:  Nature       Date:  2000-11-02       Impact factor: 49.962

6.  Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals.

Authors:  Andrew Grimson; Mansi Srivastava; Bryony Fahey; Ben J Woodcroft; H Rosaria Chiang; Nicole King; Bernard M Degnan; Daniel S Rokhsar; David P Bartel
Journal:  Nature       Date:  2008-10-01       Impact factor: 49.962

7.  Endogenous tRNA-Derived Fragments Suppress Breast Cancer Progression via YBX1 Displacement.

Authors:  Hani Goodarzi; Xuhang Liu; Hoang C B Nguyen; Steven Zhang; Lisa Fish; Sohail F Tavazoie
Journal:  Cell       Date:  2015-05-07       Impact factor: 41.582

8.  Analysis of the miR-34 family functions in breast cancer reveals annotation error of miR-34b.

Authors:  M E Engkvist; E W Stratford; S Lorenz; L A Meza-Zepeda; O Myklebost; E Munthe
Journal:  Sci Rep       Date:  2017-08-28       Impact factor: 4.379

9.  miRTrace reveals the organismal origins of microRNA sequencing data.

Authors:  Wenjing Kang; Yrin Eldfjell; Bastian Fromm; Xavier Estivill; Inna Biryukova; Marc R Friedländer
Journal:  Genome Biol       Date:  2018-12-04       Impact factor: 13.583

10.  miRBaseConverter: an R/Bioconductor package for converting and retrieving miRNA name, accession, sequence and family information in different versions of miRBase.

Authors:  Taosheng Xu; Ning Su; Lin Liu; Junpeng Zhang; Hongqiang Wang; Weijia Zhang; Jie Gui; Kui Yu; Jiuyong Li; Thuc Duy Le
Journal:  BMC Bioinformatics       Date:  2018-12-31       Impact factor: 3.169

View more
  62 in total

1.  In Silico Analysis of Micro-RNA Sequencing Data.

Authors:  Ernesto Aparicio-Puerta; Bastian Fromm; Michael Hackenberg; Marc K Halushka
Journal:  Methods Mol Biol       Date:  2021

2.  Sperm fate is promoted by the mir-44 microRNA family in the Caenorhabditis elegans hermaphrodite germline.

Authors:  Katherine A Maniates; Benjamin S Olson; Allison L Abbott
Journal:  Genetics       Date:  2021-03-03       Impact factor: 4.562

Review 3.  Contributions of microRNAs to Peripheral Insulin Sensitivity.

Authors:  Kang Ho Kim; Sean M Hartig
Journal:  Endocrinology       Date:  2022-02-01       Impact factor: 4.736

4.  miRMut: Annotation of mutations in miRNA genes from human whole-exome or whole-genome sequencing.

Authors:  Martyna O Urbanek-Trzeciak; Piotr Kozlowski; Paulina Galka-Marciniak
Journal:  STAR Protoc       Date:  2021-12-15

5.  Ago2-Dependent Processing Allows miR-451 to Evade the Global MicroRNA Turnover Elicited during Erythropoiesis.

Authors:  Dmitry A Kretov; Isha A Walawalkar; Alexandra Mora-Martin; Andrew M Shafik; Simon Moxon; Daniel Cifuentes
Journal:  Mol Cell       Date:  2020-03-18       Impact factor: 17.970

6.  MicroRNA Clustering Assists Processing of Suboptimal MicroRNA Hairpins through the Action of the ERH Protein.

Authors:  Wenwen Fang; David P Bartel
Journal:  Mol Cell       Date:  2020-04-16       Impact factor: 17.970

7.  The RNA Atlas expands the catalog of human non-coding RNAs.

Authors:  Lucia Lorenzi; Hua-Sheng Chiu; Francisco Avila Cobos; Stephen Gross; Pieter-Jan Volders; Robrecht Cannoodt; Justine Nuytens; Katrien Vanderheyden; Jasper Anckaert; Steve Lefever; Aidan P Tay; Eric J de Bony; Wim Trypsteen; Fien Gysens; Marieke Vromman; Tine Goovaerts; Thomas Birkballe Hansen; Scott Kuersten; Nele Nijs; Tom Taghon; Karim Vermaelen; Ken R Bracke; Yvan Saeys; Tim De Meyer; Nandan P Deshpande; Govardhan Anande; Ting-Wen Chen; Marc R Wilkins; Ashwin Unnikrishnan; Katleen De Preter; Jørgen Kjems; Jan Koster; Gary P Schroth; Jo Vandesompele; Pavel Sumazin; Pieter Mestdagh
Journal:  Nat Biotechnol       Date:  2021-06-17       Impact factor: 54.908

8.  Isolation and Analysis of MicroRNAs from Extracellular Vesicles of the Parasitic Model Nematodes Nippostrongylus brasiliensis and Trichuris muris.

Authors:  Ramon M Eichenberger
Journal:  Methods Mol Biol       Date:  2021

9.  A comprehensive framework for analysis of microRNA sequencing data in metastatic colorectal cancer.

Authors:  Eirik Høye; Bastian Fromm; Paul H M Böttger; Diana Domanska; Annette Torgunrud; Christin Lund-Andersen; Torveig Weum Abrahamsen; Åsmund Avdem Fretland; Vegar J Dagenborg; Susanne Lorenz; Bjørn Edwin; Eivind Hovig; Kjersti Flatmark
Journal:  NAR Cancer       Date:  2022-01-14

10.  Tumor IsomiR Encyclopedia (TIE): a pancancer database of miRNA isoforms.

Authors:  Xavier Bofill-De Ros; Brian Luke; Robert Guthridge; Uma Mudunuri; Michael Loss; Shuo Gu
Journal:  Bioinformatics       Date:  2021-03-17       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.