| Literature DB >> 28174574 |
Burcu Alptekin1, Bala A Akpinar2, Hikmet Budak1.
Abstract
microRNAs (miRNAs) are tiny ribo-regulatory molecules involved in various essential pathways for persistence of cellular life, such as development, environmental adaptation, and stress response. In recent years, miRNAs have become a major focus in molecular biology because of their functional and diagnostic importance. This interest in miRNA research has resulted in the development of many specific software and pipelines for the identification of miRNAs and their specific targets, which is the key for the elucidation of miRNA-modulated gene expression. While the well-recognized importance of miRNAs in clinical research pushed the emergence of many useful computational identification approaches in animals, available software and pipelines are fewer for plants. Additionally, existing approaches suffers from mis-identification and annotation of plant miRNAs since the miRNA mining process for plants is highly prone to false-positives, particularly in cereals which have a highly repetitive genome. Our group developed a homology-based in silico miRNA identification approach for plants, which utilizes two Perl scripts "SUmirFind" and "SUmirFold" and since then, this method helped identify many miRNAs particularly from crop species such as Triticum or Aegliops. Herein, we describe a comprehensive updated guideline by the implementation of two new scripts, "SUmirPredictor" and "SUmirLocator," and refinements to our previous method in order to identify genuine miRNAs with increased sensitivity in consideration of miRNA identification problems in plants. Recent updates enable our method to provide more reliable and precise results in an automated fashion in addition to solutions for elimination of most false-positive predictions, miRNA naming and miRNA mis-annotation. It also provides a comprehensive view to genome/transcriptome-wide location of miRNA precursors as well as their association with transposable elements. The "SUmirPredictor" and "SUmirLocator" scripts are freely available together with a reference high-confidence plant miRNA list.Entities:
Keywords: SUmirLocator; SUmirPredictor; TE-miR; miRNA; miRNA annotation
Year: 2017 PMID: 28174574 PMCID: PMC5258749 DOI: 10.3389/fpls.2016.02058
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1An overview about miRNA identification methodology. The pipeline accepts sequences from genomic and transcriptomic data in “fasta” format. It can also work with small RNA sequencing data with some modifications. “SUmirFind” script searches for detection of any putative miRNAs by alignment of sequences to known plant miRNAs with 2 or fever mismatches. Candidate sequences are then searched for presence of pre-miRNA-like secondary structures by “SUmirFold” while the candidates are further eliminated by “SUmirPredictor” based on miRNA precursor characteristics. Potential miRNA sequences are also inspected for detection of any false-positive predictions with the alignment of candidates to other known small RNA sequences and organellar genomes. The obtained final list of mature miRNAs and their precursors are inspected with a few more analysis for characterization and annotation of miRNAs. Detected putative pre-miRNA structures are further evaluated for the representation and genomic/transcriptomic distribution analysis with the help of “SUmirLocator” script. Target identification and enrichment analysis of miRNA candidates are conducted based on “psRNAtarget” and Blast2GO software. Candidate miRNAs are also analyzed for the in silico expression evidence at both pre-miRNA and mature miRNA level. Additionally, miRNA precursors are searched for understanding their association with transposable elements (TE) and based on their relation level; they are further characterized as TE-miRs or siRNA candidates.
Figure 2Different hairpins obtained as “SUmirFold” outputs and their filtration process with “SUmirPredictor”. (1) Mature miRNA starts at 21th base and ends at 41th base where the miRNA* starts at 86th base and ends at 106th base [indicated by black (mature miRNA) and red (miRNA*) sticks]. There is no mismatch in the DICER-LIKE enzyme cutting region and there is a proper loop structure. Such structures are remarked as “OK” by “SUmirPredictor” since it represents genuine miRNA characteristics. (2) Mature miRNA start site aligns between 44 and 64th bases where the miRNA* detected in between 21 and 40th bases. Since the two nucleotides of miRNA* aligns in the head section of hairpin structure, DICER-LIKE enzyme may not process it properly; thus, this miRNA is remarked as “Head” by “SUmirPredictor”. (3) Mature miRNA aligns between 21 and 41th bases and there is a mismatch on the start area of mature miRNA where the DICER-LIKE enzyme cutting region located. Enzyme may not able to process this sequences and such structures are remarked as “Dicer-cut”. (4) miRNA precursor has more than one loop structure on its head, so this structure is remarked as “Multiloop” by “SUmirPredictor” (Two different loops were indicated by arrows).
Figure 3Redundant annotations detected by “SUmirPredictor”. Up to three mismatches criteria used in the initial identification of candidate mature miRNA sequences may lead to redundant annotations of the same candidate sequence (indicated by arrows). Here, Arabidopsis lyrata miR156g-5p and miR157d-5p had mismatched bases in different locations; consequently, the same mature miRNA sequence appears as twice as two different candidates (Aly: Arabidopsis lyrata). Since the aly-miR156g-5p displayed higher sequence homology to putative mature miRNA sequence, candidate miRNA named as miR156 and miR157 was eliminated.
Figure 4Including pre-miRNA sequences which codes for different mature miRNAs into miRNA representation. Putative pre-miR156 sequence is predicted to encode two distinct mature miRNA sequences for miR156 family. Both of these miRNAs are included in the genomic representation as separate units. It must be noted that these sequences are not mature miRNA/miRNA* pairs; instead, they are two different sequences belonging to miR156 family.
Summary statistics of miRNA identification and filtering corresponding to four different data sets from .
| ~272 Mbp | 14,376 | 4090 (+1062 suspects) | 1015 | 40 | |
| ~218 Mbp | 9482 | 1198 (+106 suspects) | 87 | 21 | |
| ~17 Gbp | 118,100 | 14,290 (+3116 suspects) | 7627 | 48 | |
| ~114 Mbp | 5688 | 265 (+32 suspects) | 106 | 20 |
Figure 5(A) Distribution of identified putative miRNAs on different chromosomes of B. distachyon. (B) Distribution of identified putative miRNAs on different chromosomes of T. aestivum. (C) miRNA content of each sub-genome of T. aestivum.
Distribution of miRNA families on the different chromosomes of the .
| Bd1 | miR1122, miR127, miR1128, miR1133, miR1135, miR1432, miR1435, miR1436, miR1439, miR160, miR166, miR167, miR169, miR171, miR395, miR396, miR399, miR437, miR528 |
| Bd2 | miR1122, miR1128, miR1133, miR1135, miR1139, miR135, miR1436, miR1439, miR156, miR157, miR159, miR164, miR169, miR319, miR399,miR437 |
| Bd3 | miR1118, miR1122, miR1128, miR1133, miR1135, miR1136, miR1139, miR1435, miR1436, miR1439, miR156, miR160, miR164, miR166, miR169,miR172, miR2275, miR394, miR395, miR397, miR437, miR529, miR818, miR845 |
| Bd4 | miR1122, miR1128,miR1133, miR1135, miR1435, miR1436, miR1439, miR156, miR157, miR166, miR167, miR169, miR2118, miR437,miR818 |
| Bd5 | miR1118, miR1122, miR1128, miR1133, miR1135, miR1136, miR1139, miR156, miR157, miR169, miR171, miR2118, miR2218, miR393, miR395, miR399, miR479, miR482, miR530 |
| Tae chr1A | miR1117, miR1118, miR1120, miR1122, miR1128, miR1131, miR1135, miR1136, miR1137, miR1436, miR164, miR166, miR171, miR399, miR9664, miR9666 |
| Tae chr1B | miR1117, miR1118, miR1122, miR1123, miR1125, miR1128, miR1131, miR1133, miR1135, miR1136, miR1137, miR1436, miR164, miR166, miR171, miR399, miR9664 |
| Tae chr1D | miR1117, miR1121, miR1122, miR1125, miR1128, miR1135, miR1136, miR1137, miR1139, miR1436, miR164, miR166, miR171, miR399, miR9664 |
| Tae chr2A | miR1117, miR1118, miR1120, miR1121, miR1122, miR1125, miR1128, miR1131, miR1135, miR1136, miR1137, miR1436, miR169, miR393, miR395, miR399, miR530, miR9666 |
| Tae chr2B | miR1117, miR1118, miR1120, miR1122, miR1123, miR1125, miR1128, miR1130, miR1131, miR1133, miR1135, miR1136, miR1137, miR1436, miR169, miR171, miR393, miR395, miR399, miR437, miR530 |
| Tae chr2D | miR1117, miR1118, miR1120, miR1122, miR1125, miR1131, miR1135, miR1136, miR1137, miR1139, miR1436, miR169, miR393, miR395, miR399, miR530 |
| Tae chr3A | miR1117, miR1118, miR1120, miR1121, miR1122, miR1125, miR1135, miR1136, miR1137, miR1139, miR1436, miR156, miR393, miR399, miR9666, miR9677 |
| Tae chr3B | miR1117, miR1118, miR1120, miR1121, miR1122, miR1128, miR1131, miR1133, miR1135, miR1136, miR1137, miR1436, miR1439, miR156, miR172, miR319, miR437, miR9677 |
| Tae chr3D | miR1117, miR1118, miR1122, miR1135, miR1136, miR1137, miR1138, miR1436, miR1439, miR156, miR399, miR9669, miR9677 |
| Tae chr4A | miR1117, miR1118, miR1120, miR1122, miR1125, miR1128, miR1131, miR1133, miR1135, miR1136, miR1137, miR1436, miR167, miR169, miR171, miR9666 |
| Tae chr4B | miR1117, miR1118, miR1120, miR1121, miR1122, miR1125, miR1128, miR1130, miR1131, miR1133, miR1135, miR1136, miR1137, miR1436, miR169, miR171 |
| Tae chr4D | miR1117, miR1118, miR1122, miR1128, miR1135, miR1136, miR1137, miR1436, miR166, miR167, miR169, miR171 |
| Tae chr5A | miR1117, miR1118, miR1120, miR1122, miR1125, miR1131, miR1135, miR1136, miR1137, miR1139, miR1436, miR156, miR166, miR167, miR169, miR528, miR9666, miR9772 |
| Tae chr5B | miR1117, miR1118, miR1120, miR1121, miR1122, miR1125, miR1128, miR1131, miR1133, miR1135, miR1136, miR1137, miR1436, miR160, miR166, miR167, miR169, miR2118, miR398, miR5062, miR9772 |
| Tae chr5D | miR1117, miR1118, miR1120, miR1121, miR1122, miR1135, miR1136, miR1137, miR1138, miR1436, miR1439, miR156, miR160, miR166, miR167, miR169, miR398, miR9772 |
| Tae chr6A | miR1117, miR1118, miR1121, miR1122, miR1131, miR1135, miR1136, miR1137, miR1436, miR156, miR160, miR394, miR396, miR397, miR9666, miR9670 |
| Tae chr6B | miR1117, miR1118, miR1120, miR1121, miR1122, miR1125, miR1128, miR1131, miR1133, miR1135, miR1136, miR1137, miR1436, miR160, miR164, miR394, miR396, miR397, miR9663 |
| Tae chr6D | miR1117, miR1118, miR1122, miR1125, miR1131, miR1135, miR1136, miR1137, miR1436, miR156, miR160, miR164, miR394, miR396, miR9662, miR9670 |
| Tae chr7A | miR1117, miR1118, miR1121, miR1122, miR1123, miR1125, miR1131, miR1135, miR1136, miR1137, miR1436, miR160, miR169, miR2275, miR396, miR399, miR9666 |
| Tae chr7B | miR1117, miR1118, miR1122, miR1125, miR1128, miR1131, miR1135, miR1136, miR1137, miR1436, miR160, miR166, miR169, miR396, miR399 |
| Tae chr7D | miR1117, miR1118, miR1120, miR1121, miR1122, miR1125, miR1131, miR1135, miR1136, miR1137, miR1436, miR160, miR166, miR169, miR2275, miR399 |
| Tae chrUn | miR1117, miR1121, miR1122, miR1128, miR1131, miR1133, miR1135, miR1136, miR1137, miR1436, miR169, miR171, miR399, miR9666 |
Figure 6Alternative splicing of miRNA precursors. miRNA genes might get through alternative splicing and the spliced variants might generate different miRNA precursors. In order to understand such effects of alternative splicing, miRNA precursor identified from transcriptomic data were aligned back to genome with GMAP and alignment results were visualized with IGV. In this example, three different contigs can be transcribed from the same genomic region of Brachypodium genome (c63509_g1_i1, c63509_g1_i2, and c63509_g1_i3). The generation of isoform 1 and 2 leads to formation of miRNA members from both miR169 and miR1436 families. If the isoform 3 is produced, the miR1436 sequences cannot be generated from this transcript.
Figure 7Alignment of small RNA sequencing data to pre-miRNA. The small RNA sequencing reads are aligned to the predicted pre-miRNA sequences to show the efficiency of “SUmirFold” and “SUmirPredictor” for detection of mature miRNA/miRNA* pairs on the miRNA precursors. Alignment result from GMAP and Bowtie2 which are visualized with IGV showed that small RNA reads were successfully aligned to predicted locations of mature miRNA/miRNA* pairs for miR1432-3p-1 from B. distachyon. This analysis can be used for inspection of genuineness of miRNA precursor sequences in an addition to in silico pre-miRNA expression analysis expression analysis.
Figure 8Distribution of TE elements families on Distribution of TE element families on B. distachyon transcriptome miRNAs. (B) Distribution of TE element families on B. distachyon genome miRNAs.
Figure 9Distribution of TE elements families on Distribution of TE element families on T. aestivum transcriptome miRNAs. (B) Distribution of TE element families on T. aestivum genome miRNAs.
Figure 10Distribution of sRNA reads on putative TE-miR. miR156-3p-1 from T. aestivum genome is a TE-miR candidate which aligned to TE element “RLC_36906|LTR_Sb_chr_09_853” with more than 50% of its length. The distribution of sRNA reads are concentrated on regions where the mature miRNA and miRNA* sequences were predicted as located by “SUmirFold” (mature miRNA is between 101 and 122th (marked with black square) and star sequence is between 21 and 40th bases (marked with red square).
Figure 11Distribution of sRNA reads on putative siRNA candidates. miR1436-3p-156 from B. distachyon genome is a siRNA candidate which shows similarity to TE element “DTC_155186|DTC_Jorge_3B_034_E06-2” and the dispersed distribution of sRNA reads on the precursor supports the genuineness of this predicted siRNA. Instead of mature miRNA (marked with black sticks, from 82 to 102th bases) and miRNA* locations (marked with red sticks, from 21 to 41th bases), sRNA reads are dispersed along the precursor.
The most enriched known targets of mostly represented miRNAs from .
| miR1117 | NA |
| miR1122 | Pre-mRNA-processing-splicing factor 8, Uncharacterized protein LOC100837429 isoform X1 |
| miR1130 | Tropinone reductase, Kinesin KIF15 |
| miR1436 | Methyltransferase 6 isoform X2, WAT1-related At5g64700-like protein, Calcium-dependent kinase |
| miR1439 | Uncharacterized protein LOC100824126, Weak chloroplast movement under blue light 1-like, Ubiquitin carboxyl-terminal hydrolase 27 isoform X1 |
| miR156 | Squamosa promoter-binding 3 |
| miR166 | Uncharacterized protein LOC106866306 |
| miR169 | Uncharacterized protein LOC100822852 isoform X1, Probable transport Sec1a isoform X2 |