| Literature DB >> 25114903 |
Cheng-Tsung Pan1, Kuo-Wang Tsai2, Tzu-Min Hung3, Wei-Chen Lin4, Chao-Yu Pan3, Hong-Ren Yu5, Sung-Chou Li6.
Abstract
MicroRNAs (miRNAs) present diverse regulatory functions in a wide range of biological activities. Studies on miRNA functions generally depend on determining miRNA expression profiles between libraries by using a next-generation sequencing (NGS) platform. Currently, several online web services are developed to provide small RNA NGS data analysis. However, the submission of large amounts of NGS data, conversion of data format, and limited availability of species bring problems. In this study, we developed miRSeq to provide alternatives. To test the performance, we had small RNA NGS data from four species, including human, rat, fly, and nematode, analyzed with miRSeq. The alignments results indicate that miRSeq can precisely evaluate the sequencing quality of samples regarding percentage of self-ligation read, read length distribution, and read category. miRSeq is a user-friendly standalone toolkit featuring a graphical user interface (GUI). After a simple installation, users can easily operate miRSeq on a PC or laptop by using a mouse. Within minutes, miRSeq yields useful miRNA data, including miRNA expression profiles, 3' end modification patterns, and isomiR forms. Moreover, miRSeq supports the analysis of up to 105 animal species, providing higher flexibility.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25114903 PMCID: PMC4119685 DOI: 10.1155/2014/462135
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The operation interface of miRSeq. miRSeq is composed of readPro and readMap. (a) readPro deals with raw sequence reads in fastq format by collapsing raw reads into unique reads, tabulating read count, and trimming 3′ adaptor. (b) readMap is responsible for mapping the reads back to known annotations, classifying reads into different categories, determining miRNA expression profile, reporting 3′ end modification patterns, and analyzing isomiR forms.
The detailed information of analyzed libraries. SRA in the “Source” column denotes that the small RNA libraries were downloaded from NCBI SRA database. The SRA IDs of the libraries were provided. CGMH denotes that the libraries were prepared and sequenced by our research team in Kaohsiung Chang Gung Memorial Hospital.
| Library | Organism | Source | Details |
|---|---|---|---|
| L1 |
| CGMH | Human breast cancer cell line: MDA-MB-361 |
| L2 |
| CGMH | Human prostate cancer cell line: PC3 |
| L3 |
| CGMH | Normal lung tissue from 4-month rat |
| L4 |
| CGMH | Normal lung tissue from 4-month rat |
| L5 |
| SRA: SRR1175721 | Synchronized adult population of nematodes |
| L6 |
| SRA: SRR1139598 | Stage-matched population of nematodes |
| L7 |
| SRA: SRR513989 | Abdomens and thoraxes from w1118 male flies infected by Nora virus |
| L8 |
| SRA: SRR351332 | Whole bodies of 2-3-day-old wild-type flies |
Alignment results of readPro. The small RNA NGS data of eight libraries were analyzed with miRSeq. Raw reads were classified into clean, nonclean, or self-ligation after 3′ adaptor trimming step. The clean reads following specified criteria are classified as qualified reads for further analysis. The sequences of adaptors 1, 2, and 3 are TGGAATTCTCGGGTGCCAAGG, TCGTATGCCGTCTTCTGCTTG, and ATCTCGTATGCCGTCTTCTGCTTG, respectively. The sequence of adaptor C (CTGTAGGCACCATCAATCGT) is based on the information in the corresponding SRA page.
| Library | All | Self-ligation | Nonclean | Clean | Qualified | Adaptor version |
|---|---|---|---|---|---|---|
| L1 | 11,229,160 | 0.72% | 4.65% | 94.62% | 86.15% | Adaptor 1 |
| L2 | 11,501,087 | 0.02% | 3.34% | 96.65% | 86.12% | Adaptor 1 |
| L3 | 6,314,030 | 0.04% | 3.94% | 96.02% | 85.37% | Adaptor 1 |
| L4 | 6,235,528 | 0.03% | 3.71% | 96.26% | 87.98% | Adaptor 1 |
| L5 | 22,634,033 | 15.08% | 7.59% | 77.33% | 68.37% | Adaptor 2 |
| L6 | 9,023,339 | 0.23% | 3.73% | 96.04% | 74.08% | Adaptor 1 |
| L7 | 10,314,488 | 0.00% | 13.61% | 86.39% | 83.08% | Adaptor C |
| L8 | 22,435,248 | 1% | 7% | 92% | 83% | Adaptor 3 |
Figure 2Length distribution comparisons of clean reads between libraries. From the output results of readPro, we may compare the length distribution of clean reads, examining if length enrichment occurs. (a) The length distribution pattern of the well-prepared L2 library was more similar to the one of miRBase miRNAs. (b) L3 and L4 libraries had similar distribution patterns. (c) The read length of L5 library scattered without enrichment. (d) The reads with length 30-nt dominated L7 library.
Figure 3Illustration of readMap alignment output. From the output results of readMap, we may compare read category, 3′ end modification, and isomiR patterns among libraries. (a) readMap reports the read categories between libraries. High percentage of rRNA reads in L1 library resulted from the nonstandard protocol in the in-gel size fraction procedure. The non-miRNA reads accounted for much higher proportions in the libraries with less enriched read length at 22-nt. (b) readMap reports the 3′ modification patterns of miRNA reads. Such information is consistent between the libraries from the same species. Here, only the patterns more frequent than 1% are shown. (c) The top sequence denotes pre-miRNA sequence with mature miRNA marked in upper case. All isomiR forms are shown according to their relative position in pre-miRNA. The 3′ end modification patterns are presented in lower case. The middle digits and right-hand side information containing comma denote the read count and position shift of each isomiR.
MiRNA expression profile. MiRNA expression profiles of libraries were presented in the unit transcript per million (TPM). Here, only the data of the five most abundant miRNAs is shown.
| Library | 1st miRNA | 2nd miRNA | 3rd miRNA | 4th miRNA | 5th miRNA | % |
|---|---|---|---|---|---|---|
| L1 | hsa-miR-30a-5p | hsa-miR-21-5p | hsa-miR-181a-5p | hsa-miR-92a-3p | hsa-miR-22-3p | 51.69% |
| L2 | hsa-miR-92a-3p | hsa-miR-22-3p | hsa-miR-143-3p | hsa-miR-10a-5p | hsa-miR-21-5p | 46.41% |
| L3 | rno-miR-143-3p | rno-miR-30a-5p | rno-miR-26a-5p | rno-miR-181a-5p | rno-miR-22-3p | 52.77% |
| L4 | rno-miR-143-3p | rno-miR-30a-5p | rno-miR-26a-5p | rno-miR-22-3p | rno-miR-10a-5p | 51.43% |
| L5 | cel-miR-58-3p | cel-miR-70-3p | cel-miR-71-5p | cel-miR-65-5p | cel-miR-241-5p | 83.53% |
| L6 | cel-miR-80-3p | cel-miR-35-3p | cel-miR-52-5p | cel-miR-72-5p | cel-miR-229-5p | 38.84% |
| L7 | dme-miR-1-3p | dme-miR-317-3p | dme-miR-276a-3p | dme-miR-263a-5p | dme-miR-184-3p | 63.56% |
| L8 | dme-miR-1-3p | dme-miR-8-3p | dme-miR-184-3p | dme-let-7-5p | dme-miR-263a-5p | 86.06% |
The numbers of detected miRNAs and pre-miRNAs. The values in brackets denote the numbers of mature miRNAs and pre-miRNAs of the corresponding species according to miRBase 20 annotation. In addition to the individual miRNA expression profile, the information of all miRNA profiles is also provided.
| Category | Detected miRNA | Detected pre-miRNA | Detected opp-miRNA |
|---|---|---|---|
| L1 | 533 (2,578) | 411 (1,872) | 28 |
| L2 | 1098 (2,578) | 824 (1,872) | 86 |
| L3 | 425 (728) | 272 (449) | 15 |
| L4 | 444 (728) | 288 (449) | 15 |
| L5 | 229 (368) | 167 (223) | 10 |
| L6 | 259 (368) | 165 (223) | 9 |
| L7 | 257 (426) | 158 (238) | 4 |
| L8 | 269 (426) | 167 (238) | 6 |
The time needed for a miRSeq alignment. Input data: L1 small RNA NGS data, totally 11,229,160 reads and accounting for approximately 1.7 GB disk space.
| OS | CPU | Memory | readPro | readMap |
|---|---|---|---|---|
| 32 bit Win. XP | Intel Pentium 4, 3.0 GHz | 1.0 GB | 6 min. | 40 min. |
| 32 bit Win. XP | Intel Atom D525, 1.0 GHz | 3.0 GB | 12 min. | 52 min. |
| 64 bit Win. 7 | Intel Core i5, 1.7 GHz | 4.0 GB | 5 min. | 4 min. |
| 64 bit Win. Server 2008 | Intel Xeon E5-2620, 2.0 GHz | 16.0 GB | 5 min. | 7 min. |