Literature DB >> 23349828

The sequence structures of human microRNA molecules and their implications.

Zhide Fang¹, Ruofei Du, Andrea Edwards, Erik K Flemington, Kun Zhang.

Abstract

The count of the nucleotides in a cloned, short genomic sequence has become an important criterion to annotate such a sequence as a miRNA molecule. While the majority of human mature miRNA sequences consist of 22 nucleotides, there exists discrepancy in the characteristic lengths of the miRNA sequences. There is also a lack of systematic studies on such length distribution and on the biological factors that are related to or may affect this length. In this paper, we intend to fill this gap by investigating the sequence structure of human miRNA molecules using statistics tools. We demonstrate that the traditional discrete probability distributions do not model the length distribution of the human mature miRNAs well, and we obtain the statistical distribution model with a decent fit. We observe that the four nucleotide bases in a miRNA sequence are not randomly distributed, implying that possible structural patterns such as dinucleotide (trinucleotide or higher order) may exist. Furthermore, we study the relationships of this length distribution to multiple important factors such as evolutionary conservation, tumorigenesis, the length of precursor loop structures, and the number of predicted targets. The association between the miRNA sequence length and the distributions of target site counts in corresponding predicted genes is also presented. This study results in several novel findings worthy of further investigation that include: (1) rapid evolution introduces variation to the miRNA sequence length distribution; (2) miRNAs with extreme sequence lengths are unlikely to be cancer-related; and (3) the miRNA sequence length is positively correlated to the precursor length and the number of predicted target genes.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
MicroRNAs
RNA Precursors

Year: 2013 PMID： 23349828 PMCID： PMC3548844 DOI： 10.1371/journal.pone.0054215

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

MicroRNAs (miRNAs) have been identified as a group of small endogenous non-coding RNAs that negatively regulate protein-coding messenger RNAs (mRNAs) at the post-transcriptional level. The derived process and the main activity of a miRNA are clear and well described in the literature. Mature miRNAs are single-stranded RNAs consisting of about 22 nucleotides and are derived from longer non-coding primary miRNAs (pri-miRNAs) and then from precursor miRNAs (pre-miRNAs) by the sequential actions of the Drosha and Dicer RNA cleaving enzymes [1]–[3]. The main function of miRNAs is to step in and intervene in the translation of mRNAs or to induce degradation of the mRNAs. In mammals, mature miRNAs are incorporated into an RNA-inducing silencing complex (RISC). The activated RISC permits the miRNAs to bind to the 3′ untranslated regions (3′UTR) of specific target mRNAs to suppress translation and cause their degradation by mRNA decay [4]–[6]. There may not be a one-to-one correspondence between the miRNAs and targeted mRNAs. A single miRNA may have multiple mRNA targets. It is a challenging task to predict the targeted mRNAs of a miRNA, though the precise prediction is essential to study its functional activity and its association with diseases. The process of deriving a miRNA molecule and its main activity is depicted in .

Figure 1

Biogenesis of mature miRNAs and their functional activity.

To annotate a cloned sequence as a miRNA, the most important criteria include the characteristic length (approximately 22-nucleotide) of the sequence and a corresponding compact pre-miRNA loop structure, with a median of 83 nucleotides [7]. The association between the biological significance and the sequence length heterogeneity has been recently recognized for a mature miRNA in Arabidopsis thaliana [8]. This study shows the importance and necessity to study the distributional structure of the sequence lengths of mature miRNA molecules in the genome, and the factors that may affect this length heterogeneity. With the development of profiling technology and the advances of bioinformatics/computational tools, the number of miRNAs identified has increased dramatically. Since the first miRNA was discovered in 1993 [9] and the biological functions of miRNAs were recognized to be conserved in different species in 2000 [10], [11], the number of mature human miRNAs jumped from 152 in August 2004 to 1732 in April 2011, according to miRBase, a database of published miRNA sequences and annotation [12]–[16]. In this paper, we systematically investigate the length distribution of miRNAs, anticipating that the nature of non-uniformity of this distribution can reveal the complexity of the miRNA molecular structures and have implications for genetic research.

Materials and Methods

Materials

The sequences of 1732 human mature miRNAs and the corresponding precursor miRNAs were downloaded from the public database miRBase (Release 17, April 2011). All the calculations were carried out using the R language.

Statistical Methods

A random variable is defined to have an asymmetric Laplace distribution if it has densitywhere are three unknown parameters. It reduces to symmetric Laplace distribution when The function is the indicator function. The maximum likelihood estimates, of these parameters are available in [17]. With these estimates, the fitted discrete asymmetric Laplace distribution, DALaplace, has the probability masses defined by, where k ranges from 16 to 27 (the range of the sequence lengths of human mature miRNAs). The discrete symmetric Laplace distribution (DLaplace) is defined in the same fashion. A zero-inflated Poisson model is defined aswhere is non-negative integer, and are unknown parameters. This is a mixture model and it reduces to Poisson distribution with mean when or a single point distribution putting its all mass at zero when We fitted this model to the absolute differences of mature miRNA sequence lengths and their median, and obtained the maximum likelihood estimates [18]: Then the discrete, symmetric zero-inflated model (DSZero-Inf) is defined as, where k ranges from 16 to 27, and m is the median of the observed sequence lengths of human mature miRNAs. The tPoisson distribution is defined aswhere k ranges from 16 to 27, λ is the average sequence length of all human mature miRNAs, and c is a constant such that the sum of all probabilities is one.

Results and Discussion

The Distribution of the Sizes of Human Mature miRNA Molecules

The number of nucleotides in a human mature miRNA is a discrete random variable, which ranges from 16 to 27 and has a mode and a median of 22. A histogram of lengths of all human mature miRNA molecules is presented in . Though the Poisson distribution is the traditional model for fitting the count data, it does not fit the length distribution of mature miRNA molecules well. is the Poissonness plot of the data [19]. It is created by plotting against , where is the count, is the corresponding observed frequency and k! represents the factorial of k. It is clear that the plotted points do not fall onto a straight line, with the points at the middle above the line and the points at both ends below the line. This suggests non-Poisson distribution should be employed to fit the length distribution of mature miRNAs.

Figure 2

Histogram and corresponding Poissonness plot of the sequence lengths (sizes) of human mature miRNA molecules.

A unique feature of Poisson distribution is the equality of its mean and variance. This is not the case for the lengths of human mature miRNA molecules because the sample mean (21.52) is much larger than the sample variance (2.51). This fact also implies that negative binomial distribution, another popular distribution to model the count data and handle the over-dispersion problem in counts, could not fit the human mature miRNA lengths well. We show in the schematic fitting results of three discrete distributions to the sequence lengths of human mature miRNA molecules. These include a discrete analogue of the asymmetric Laplace distribution (denoted as DALaplace), a discrete, symmetric distribution induced from the zero-inflated model (denoted as DSZero-Inf) and a truncated Poisson distribution (denoted as tPoisson). Details of DALaplace, DSZero-inf and tPoisson are discussed in the Materials and Methods. Interested readers are referred to [17] for the definition of the asymmetric Laplace distribution and methods for parameter estimation, and to [19] for the definition and applications of the zero-inflated model. It is clear from the plot that DALaplace performs best while tPoisson is the worst. The sum of squares of the residuals (differences between observed percentages and corresponding fitted values) are 0.0047, 0.01, 0.175 for DALaplace, DSZero-inf and tPoisson, respectively, further illustrating the performance of these models. We also calculated AIC (Akaike information criterion) to evaluate the relative goodness of fit of these non-nested models. AICs for DALaplace, DSZero-inf and tPoisson are 5893.396, 6117.659, and 7970.977 respectively. The order of these values confirms our selection of the model.

Figure 3

Histogram of sequence lengths of human mature miRNA molecules and four fitted models.

The Randomness of Bases in Mature Human miRNA Molecules

Another question of interest to biologists is whether there is any structural pattern in a human mature miRNA; in other words, whether the proportion of one nucleotide base is significantly higher or lower than those of other bases. We intend to address this problem in this subsection. Given the length of a mature miRNA sequence, the vector of counts, of the bases, A, C, G, U, follows a multinomial distribution. By the likelihood ratio method for the test of proportional homogeneity, we conclude that the proportions of the four bases in every sequence are significantly different (p-value ≈ 0). We further find that at the significance level of 0.05, that there are 341 (about 20%) mature miRNA sequences showing inequality of base probabilities. The sample proportions of four bases in all miRNA sequences are presented in , with the 95% simultaneous confidence interval [20] at the top of corresponding bar. These intervals clearly indicate that the four bases are not equally probable in all the sequences. All these findings imply that there may exist structural patterns in the sequences of certain mature miRNAs.

Figure 4

The sample proportions of nucleotide bases and GC, AU contents.

However, as demonstrated in , the content of GC (50.8%) is very close to that of AU (49.2%). The 99.9% confidence interval for the GC content is (0.499, 0.516), which is narrow and covers 0.5. We comment that the hypothesis of the GC content being 50% holds as long as the significance level is set to be greater than 0.0028. This is due to the facts that the sample size N (the total number of bases in all mature miRNA sequences) is large and that in the hypothesis testing of a proportion, the significant probability goes to zero as N approaches infinity.

The Relationship to Evolutionary Conservation

Highly conserved DNA sequences are thought to have functional value. The genetic conservation across evolution has been an important benchmark for detecting functionally important nucleic acid sequences, and for studying gene interactions in a group of co-regulated genes [21]–[24]. Hirsh and Fraser [25] revealed a negative and highly significant relationship between the importance of a gene and the evolutionary rate. Similar relationship for miRNAs was also studied in the literature. Zhang et al. [26] reported the rapid evolution of some miRNA clusters. In this subsection, we present our findings on the correlation between evolutionary conservation and the length of mature miRNA molecules. To our knowledge, this is the first study exploring this relationship. All human mature miRNAs are divided into two classes, conserved and human-specific, by using the procedure documented in [27]. Out of 1732 mature miRNAs, there are 914 (about 52.8%) miRNAs labeled as conserved and 818 (about 47.2%) as human-specific. These two ratios are significantly different (one-sided p-value is 0.01). shows the length distributions of the sequences in these two groups. We can see that the sequence lengths of conserved miRNAs are symmetrically distributed around 22. Both the discrete, symmetric, zero-inflated distribution (DSZero-inf) and the discrete, symmetric Laplace (DLaplace) can model the distribution decently and there is little difference between these two models. On the contrary, the sequence lengths of human-specific miRNAs seem to be bi-modally distributed with modes of 16 and 22. One may need a mixture of two distributions to model this variable well. The percentage (7.3%) of the short human-specific miRNAs that have length of 16 or 17 is about ten-fold of that (0.77%) of the short conserved miRNAs (a Z-test for equality of two percentages gives a p-value close to 0).

Figure 5

Histograms and fitted distributions of the sequence lengths of mature conserved and human – specific miRNAs.

All these results indicate that rapid evolution seems to increase the variation in the sequence lengths of human mature miRNA molecules, and thus complicate the distribution of the length variable.

The Characteristic Size of Human miRNA Oncogenes and Tumor Suppressors

It has become evident that miRNAs control the expression levels of gene products that are important in cancer progression. A number of studies have shown that many miRNAs reside within chromosomal fragile sites in the human genome and that many miRNAs have been linked to the initiation, progression, and metastasis of human malignancies, with the earlier reports associating miRNAs with cancers being published in [28], [29]. Some miRNAs are able to target oncogenes – those with capacity to induce tumor migration and invasion, or tumor suppressor genes – those with capacity to suppress cancer and metastasis [30]–[33]. The essence of the miRNA’s regulatory mechanism in cancer lies in that increased expression of certain miRNAs can result in down-regulation of tumor suppressor genes, while decreased expression of other miRNAs can lead to increased expression of oncogenes. Examples include hsa-miR-10B [34] and hsa-miR-21 [35] in breast cancer, and hsa-miR-155 [36] in human B cell lymphomas as oncogenes; and hsa-let-7 [37] in lung cancer, and hsa-miR-15 and hsa-miR-16 [28] in chronic lymphocytic leukemia as tumor suppressors. To investigate the distributions of the sequence lengths of the mature miRNA molecules that are associated with cancer, we generate a class of miRNAs regulating either oncogenes or tumor suppressor genes. For a miRNA to be included, there must be at least one publication indicating the causal relationship between the miRNA and the related oncogene or tumor suppressor gene. We include those miRNAs that play opposite roles in different cancers due to the fact that one miRNA may regulate multiple targets, and the same miRNA may play opposite roles in cancer progression in that it acts as a tumor suppressor in certain cancers and as an oncogene in others [38]. This makes our selection slightly different from that in [16]. If no such a causal relationship exists, a miRNA is selected as an oncogene if it is up-regulated in at least three publications, or as a tumor suppressor if it is down-regulated in at least another three papers. We exclude the miRNAs which show conflicted roles in the same cancer. We obtained 173 cancer related miRNAs listed in , where the function of a miRNA is marked “mixed” if it regulates some oncogenes in a certain cancers and other tumor suppressor genes in different types of cancers. We find that the characteristic sequence lengths of these miRNAs are very stable, with 60.7% of human miRNAs having sequences of 22 nucleotides, 96.5% of human miRNAs having sequences of 221 nucleotides, and 99.4% of human miRNAs having sequences of 222 nucleotides. The only miRNA whose sequence is of 18 nucleotides, outside of the interval 222, is has-miR-516a-3p. This miRNA has connection to human breast cancer progression [39]. The length distribution for the miRNAs exclusively regulating oncogenes (or tumor suppressors) is very similar to that of all cancer-related miRNAs. These observations suggest that an extremely long or short miRNA is unlikely cancer-related. But for a cancer-related miRNA, the sequence length does not affect its classification to oncogenes or tumor suppressors.

Table 1

All human mature miRNAs associated with cancer and their functions.

miRNA	Function	miRNA	function	miRNA	Function	miRNA	Function
let-7a	supp	miR-148b	supp	miR-21	onco	miR-34b	supp
let-7a-2*	supp	miR-150	mixed	miR-210	mixed	miR-34c-5p	supp
let-7b	supp	miR-152	supp	miR-214	supp	miR-370	supp
let-7c	supp	miR-153	supp	miR-215	supp	miR-372	onco
let-7d	supp	miR-155	onco	miR-216b	supp	miR-373*	onco
let-7e	supp	miR-15a	supp	miR-218	supp	miR-373	onco
let-7f	supp	miR-15b	supp	miR-219-1-3p	onco	miR-374a	onco
let-7f-1*	onco	miR-16	mixed	miR-22	supp	miR-375	mixed
let-7g	supp	miR-16-1*	mixed	miR-221	onco	miR-376a	supp
let-7i	supp	miR-17	onco	miR-222	onco	miR-376b	supp
miR-1	supp	miR-181a	mixed	miR-223	mixed	miR-377	supp
miR-100	supp	miR-181a-2*	onco	miR-224	onco	miR-424	supp
miR-101	supp	miR-181b	supp	miR-23a	mixed	miR-429	supp
miR-106a	onco	miR-181c	supp	miR-23b	supp	miR-432	supp
miR-106b	onco	miR-182	onco	miR-24-1*	onco	miR-449a	supp
miR-107	mixed	miR-182*	onco	miR-24	onco	miR-451	supp
miR-10a	onco	miR-183	supp	miR-24-2*	onco	miR-485-5p	supp
miR-10b	onco	miR-184	onco	miR-25	onco	miR-486-5p	supp
miR-122	supp	miR-185	supp	miR-26a	mixed	miR-494	onco
miR-124	supp	miR-18a	onco	miR-26b	mixed	miR-495	supp
miR-125a-5p	supp	miR-18a*	supp	miR-27a	onco	miR-497	supp
miR-125b	mixed	miR-191	onco	miR-27b	supp	miR-498	onco
miR-125b-1*	supp	miR-192	supp	miR-296-5p	onco	miR-503	onco
miR-125b-2*	supp	miR-193a-3p	supp	miR-29a	supp	miR-510	onco
miR-126*	mixed	miR-193b	supp	miR-29b	supp	miR-516a-3p	onco
miR-126	mixed	miR-194	supp	miR-29b-2*	supp	miR-519a	supp
miR-127-3p	supp	miR-195	supp	miR-29c	supp	miR-520c-3p	onco
miR-128	supp	miR-196a	mixed	miR-30a	mixed	miR-520h	supp
miR-129-5p	supp	miR-196a*	onco	miR-30a*	supp	miR-521	onco
miR-130b	onco	miR-197	onco	miR-30e	supp	miR-532-5p	onco
miR-133a	supp	miR-199b-5p	supp	miR-31	supp	miR-661	supp
miR-133b	supp	miR-19a	onco	miR-32	onco	miR-675	onco
miR-134	supp	miR-19b	onco	miR-320a	supp	miR-7	supp
miR-135a	supp	miR-19b-2*	onco	miR-324-5p	supp	miR-9	mixed
miR-137	mixed	miR-200a	supp	miR-326	supp	miR-9*	onco
miR-138	supp	miR-200b	supp	miR-328	supp	miR-92a	onco
miR-139-3p	supp	miR-200c	mixed	miR-330-3p	supp	miR-93	onco
miR-140-5	supp	miR-203	supp	miR-335	supp	miR-95	supp
miR-141	supp	miR-204	mixed	miR-337-3p	supp	miR-96	onco
miR-143	mixed	miR-205	supp	miR-340	onco	miR-98	onco
miR-145	supp	miR-206	supp	miR-342-5p	supp	miR-99a	supp
miR-146a	mixed	miR-20a	mixed	miR-345	onco
miR-146b-5p	supp	miR-20a*	onco	miR-346	onco
miR-148a	supp	miR-20b	onco	miR-34a	supp

The Relationship to the Size of the pre-miRNA

The stem-loop structure of the precursor miRNA is developed prior to the corresponding mature miRNAs. Thus, the association between the biogenesis of a miRNA gene and the sequence features of its stem-loop precursor is also important. Firstly, we study the distribution of the sequence lengths of pre-miRNAs. As presented in , this distribution has a median (and mode) of 83 nucleotides, but it is skewed to the right ( ). A normal model with mean 4.41 (log(83)) and standard deviation 0.16 fits the logarithm of the sequence lengths very well (the red curve in is the fitted model). This indicates that a log-normal distribution can be employed to model the length distribution of the human pre-miRNAs. A good feature is that the log-normal distribution maximizes the entropy probability among distributions whose logarithms have fixed mean and variance [40].

Figure 6

Distributions of the sequence lengths of human pre-miRNAs.

Next we study the relationship between the sequence lengths of human pre-miRNAs and the corresponding mature miRNAs. The sequence length of pre-miRNAs varies from 41 to 180 and there are multiple pre-miRNA lengths corresponding to each mature miRNA length. We first calculate the average sequence lengths of the precursors corresponding to the same mature miRNA length, and then fit the regression model of the mature miRNA length to the average precursor length. As shown in , there is a positive, significant relationship between the human mature and precursor miRNA sequence lengths (the slope of the red line is 0.388, with p value of ). The multiple R2 of the regression is 81%. The improvement due to the quadratic polynomial model fit (the blue curve) is not significant, with the p value of the quadratic term equal to 0.539. We also obtained the maximal information coefficient (MIC) and related statistical indexes proposed by Reshef et al. [41]: MIC = 0.65, MIC- (a measure of non-linearity, is the Pearson correlation coefficient) = −0.15, MAS (the maximum asymmetry score for non - monotonicity) = 0 and MCN (minimum cell number for complexity) = 2. By comparing these indexes with those in Table S1 in [41], we can conclude a linear association between these two variables – the sequence lengths of human precursor miRNA and mature miRNA.

Figure 7

Scatter-plot of the average sequence length of pre-miRNAs versus the sequence length of miRNAs: the red line is the linear regression and the blue curve is the quadratic polynomial regression.

Lastly, we look into where the human mature miRNA resides within its stem-loop precursor – in the 5′ arm or 3′ arm. We scrutinize 1732 mature miRNAs in total, with has-miR-378d excluded from the analysis because it locates in both the 3′ arm of the stem-loop precursor hsa-mir-378d-1 and the 5′ arm of the stem-loop precursor hsa-mir-378d-2. There is no significant difference between the percentages of miRNAs in the 5′ arm (49.4%) and 3′ arm (50.6%), and both sequence length distributions are symmetric around approximately 22 (nucleotides). However, as indicates, difference exists between the sequence length distributions of miRNAs resided in the two arms of the precursors. Longer mature miRNAs (with more than 22 nucleotides) more often locate in the 5′ arm than in the 3′ arm of the precursors. Among the miRNAs located in the 5′ arm and 3′ arm separately, the percentages of the miRNAs with exactly 22 nucleotides are significantly different (46.2% (in the 5′ arm) versus 50.3% (in the 3′ arm), with p value being 0.044).

Figure 8

Bar charts of mature miRNAs in the 5′ and 3′ arms of the precursors respectively.

The Association of miRNA Sequence Length and the Number of Predicted Target Genes

A miRNA is a non-coding, functional RNA molecule. Its role in post-transcriptional gene regulation is carried out by binding to the target mRNA and then destabilizing the mRNA or suppressing its translation. In most cases, one miRNA does not bind to a single mRNA but instead binds to multiple targets. The number of predicted target genes varies dramatically from one miRNA to another. Whether this number is associated with the sequence length of the miRNA molecule may have some implication in genetic research and is worthwhile to be studied. We retrieved all of the predicted targets from a web tool miRDB [42], [43]. Four mature miRNAs (hsa-miR-3124-5p, hsa-miR-3647-3p, hsa-miR-3647-5p, hsa-miR-3648) were excluded from the analysis because each of them has no predicted target gene according to this tool. Many miRNAs have the same sequence length. Each of these miRNAs binds to at least one gene. presents the average numbers of predicted target genes versus the mature miRNA sequence length. Generally speaking, the plot shows that the average number of the target genes is positively correlated to the sequence length of the miRNA, and this relationship is statistically significant when considering only the miRNAs with sequence lengths between 17 and 25 (the regression line (red) has a positive slope, 18, with the p value being 0.003),indicating that longer miRNAs tend to regulate more genes. We comment that there are only 17 miRNAs with sequence length of 16, 26 or 27. Nevertheless, this positive correlation is an interesting observation whose biological significance warrants further investigation. Increased miRNA length raises the possibility that the extra nucleotides are involved in facilitating protein complex formation. This could take the form of aiding more stable interactions with core RISC complexes or through interactions with additional regulatory co-factors. An interesting model for the latter is that these additional factors regulate targeting to different subsets of 3′ UTRs depending on the cell phenotype. In this scenario, a broader target repertoire can exist but tissue specific co-factors dictate the subset of targets in a particular tissue.

Figure 9

Scatter-plot of the average numbers of predicted target genes versus the mature miRNA sequence length.

The Relationship between miRNA Sequence Length and the Distribution of Target Binding Sites

In this subsection, we look into the relationship between the lengths and the seed sequences of miRNAs. The motivation for this analysis is that the miRNA target prediction heavily depends on the binding between the seed sequence of the miRNA and the 3′ UTR sequences of the targeted mRNAs [44]. Meanwhile, the non-seed sequence may play some roles in processing of precursor and/or the association with RICS proteins. We downloaded the summary counts, including the numbers of (non-)conserved 8mer sites, 7mer-m8 sites and 7mer-1A sites in the targeted transcripts, from a public domain TargetScan Human Release 6.1 [45]–[48]. This database contains 1524 mature miRNAs. As specified by the website, 8mer refers to a perfect match to nucleotides 2–8 of the mature miRNA (the seed+nucleotide 8) and an A residue at the first nucleotide position of the mature miRNA; 7mer-m8 refers to a perfect match to nucleotides 2–8; and 7mer-1A refers to a perfect match to nucleotides 2–7 plus an A residue at the first nucleotide position. If multiple genes are targeted by a miRNA, we calculate the sum of site counts from each gene and used it as the count of target sites for this miRNA. By doing so, we obtained six numbers, corresponding to (non-)conserved 8mer, 7mer-m8 and 7mer-1A sites, for each miRNA. The distributions of these count numbers are presented in . The left panel shows the combined counts of conserved and non-conserved sites and the right panel shows the counts of conserved sites only. Both panels indicate the same pattern of count distributions. The distributions of target site counts are very similar for miRNAs with sequence lengths from 17 to 26 (more alike across the sequence lengths from 21 to 23), while the distribution of counts from miRNAs with the shortest or longest sequence shows some discrepancy. Interestingly, for combined counts of conserved and non-conserved sites, the counts of 8-mer sites are much lower than those of 7mer-m8 sites and 7mer-1A sites which have the same medians. This phenomenon changes when only the conserved sites are considered. The median counts of 7mer-1A sites are lowest for miRNAs with sequence lengths from 16 to 26, but highest for miRNAs with sequence length 27.

Figure 10

Histograms of target site counts of the transcripts predicted by miRNAs with the same sequence lengths.

Conclusion

The length has become one of the important criteria to annotate a clone sequence as a miRNA [7]. Though it is a common understanding that a human mature miRNA has about 22 nucleotides, the statistical characteristics of the length distribution of the miRNA molecules are not trivial, and have been less studied. Based on graphics methods and the model selection criteria, we demonstrated that, compared with conventionally used Poisson distribution, the discrete analogue of a asymmetric Laplace distribution can nicely model the length distribution of human mature miRNA molecules. It has lower residual sum of squares and smaller AIC. The association study revealed that the sequence length heterogeneity is related to some biological factors such as evolution conservation, miRNA’s regulatory mechanism, etc. We found that highly conserved miRNA sequences are of lengths concentrated at 22 nucleotides while human-specific miRNAs show large variation in the length. Furthermore, the miRNAs that regulate oncogenes/tumor suppressors also show stable lengths of 22 nucleotides, and longer miRNAs tend to regulate more genes. These findings may have some implications on (cancer) genetics research and warrant additional follow-up studies.

41 in total

1. Protein dispensability and rate of evolution.

Authors: A E Hirsh; H B Fraser
Journal: Nature Date: 2001-06-28 Impact factor: 49.962

2. A uniform system for microRNA annotation.

Authors: Victor Ambros; Bonnie Bartel; David P Bartel; Christopher B Burge; James C Carrington; Xuemei Chen; Gideon Dreyfuss; Sean R Eddy; Sam Griffiths-Jones; Mhairi Marshall; Marjori Matzke; Gary Ruvkun; Thomas Tuschl
Journal: RNA Date: 2003-03 Impact factor: 4.942

3. The microRNA Registry.

Authors: Sam Griffiths-Jones
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

4. A gene-coexpression network for global discovery of conserved genetic modules.

Authors: Joshua M Stuart; Eran Segal; Daphne Koller; Stuart K Kim
Journal: Science Date: 2003-08-21 Impact factor: 47.728

Review 5. The functions of animal microRNAs.

Authors: Victor Ambros
Journal: Nature Date: 2004-09-16 Impact factor: 49.962

6. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans.

Authors: B J Reinhart; F J Slack; M Basson; A E Pasquinelli; J C Bettinger; A E Rougvie; H R Horvitz; G Ruvkun
Journal: Nature Date: 2000-02-24 Impact factor: 49.962

7. Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia.

Authors: George Adrian Calin; Calin Dan Dumitru; Masayoshi Shimizu; Roberta Bichi; Simona Zupo; Evan Noch; Hansjuerg Aldler; Sashi Rattan; Michael Keating; Kanti Rai; Laura Rassenti; Thomas Kipps; Massimo Negrini; Florencia Bullrich; Carlo M Croce
Journal: Proc Natl Acad Sci U S A Date: 2002-11-14 Impact factor: 11.205

8. miR-21-mediated tumor growth.

Authors: M-L Si; S Zhu; H Wu; Z Lu; F Wu; Y-Y Mo
Journal: Oncogene Date: 2006-10-30 Impact factor: 9.867

9. Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers.

Authors: George Adrian Calin; Cinzia Sevignani; Calin Dan Dumitru; Terry Hyslop; Evan Noch; Sai Yendamuri; Masayoshi Shimizu; Sashi Rattan; Florencia Bullrich; Massimo Negrini; Carlo M Croce
Journal: Proc Natl Acad Sci U S A Date: 2004-02-18 Impact factor: 11.205

10. The nuclear RNase III Drosha initiates microRNA processing.

Authors: Yoontae Lee; Chiyoung Ahn; Jinju Han; Hyounjeong Choi; Jaekwang Kim; Jeongbin Yim; Junho Lee; Patrick Provost; Olof Rådmark; Sunyoung Kim; V Narry Kim
Journal: Nature Date: 2003-09-25 Impact factor: 49.962

23 in total

1. Knockdown of miR-182 promotes apoptosis via regulating RIP1 deubiquitination in TNF-α-treated triple-negative breast cancer cells.

Authors: Like Wo; Dezhao Lu; Xidong Gu
Journal: Tumour Biol Date: 2016-07-30

Review 2. Metabolic Communication and Healthy Aging: Where Should We Focus Our Energy?

Authors: Hannah J Smith; Arpit Sharma; William B Mair
Journal: Dev Cell Date: 2020-07-02 Impact factor: 12.270

3. MicroRNA-539 suppresses osteosarcoma cell invasion and migration in vitro and targeting Matrix metallopeptidase-8.

Authors: Hui Jin; Wenbo Wang
Journal: Int J Clin Exp Pathol Date: 2015-07-01

4. A 3'UTR polymorphism marks differential KLRG1 mRNA levels through disruption of a miR-584-5p binding site and associates with pemphigus foliaceus susceptibility.

Authors: Gabriel A Cipolla; Jong Kook Park; Liana A de Oliveira; Sara Cristina Lobo-Alves; Rodrigo C de Almeida; Ticiana D J Farias; Débora de S Lemos; Danielle Malheiros; Robert M Lavker; Maria Luiza Petzl-Erler
Journal: Biochim Biophys Acta Date: 2016-07-14

5. miR-27a regulates the sensitivity of breast cancer cells to cisplatin treatment via BAK-SMAC/DIABLO-XIAP axis.

Authors: Sumei Zhou; Qidi Huang; Shurong Zheng; Kuailu Lin; Jie You; Xiaohua Zhang
Journal: Tumour Biol Date: 2015-12-10

6. miR-181a Mediates Inflammatory Gene Expression After Intracerebral Hemorrhage: An Integrated Analysis of miRNA-seq and mRNA-seq in a Swine ICH Model.

Authors: Kyle B Walsh; Kip D Zimmerman; Xiang Zhang; Stacie L Demel; Yu Luo; Carl D Langefeld; Eric Wohleb; Grant Schulert; Daniel Woo; Opeolu Adeoye
Journal: J Mol Neurosci Date: 2021-03-23 Impact factor: 3.444

Review 7. Beyond mitochondria: Alternative energy-producing pathways from all strata of life.

Authors: Christopher Auger; Roohi Vinaik; Vasu D Appanna; Marc G Jeschke
Journal: Metabolism Date: 2021-02-23 Impact factor: 8.694

8. Improvement, identification, and target prediction for miRNAs in the porcine genome by using massive, public high-throughput sequencing data.

Authors: Yuhua Fu; Pengyu Fan; Lu Wang; Ziqiang Shu; Shilin Zhu; Siyuan Feng; Xinyun Li; Xiaotian Qiu; Shuhong Zhao; Xiaolei Liu
Journal: J Anim Sci Date: 2021-02-01 Impact factor: 3.159

9. miR-27a and miR-27a* contribute to metastatic properties of osteosarcoma cells.

Authors: Zaidoun Salah; Rand Arafeh; Vadim Maximov; Marco Galasso; Saleh Khawaled; Samah Abou-Sharieha; Stefano Volinia; Kevin B Jones; Carlo M Croce; Rami I Aqeilan
Journal: Oncotarget Date: 2015-03-10

10. Effects of pathogen reduction systems on platelet microRNAs, mRNAs, activation, and function.

Authors: Abdimajid Osman; Walter E Hitzler; Claudius U Meyer; Patricia Landry; Aurélie Corduan; Benoit Laffont; Eric Boilard; Peter Hellstern; Eleftherios C Vamvakas; Patrick Provost
Journal: Platelets Date: 2014-04-21 Impact factor: 3.862