| Literature DB >> 30083483 |
Kai Wang1, Joshua Hoeksema2, Chun Liang1.
Abstract
Piwi-interacting RNAs (piRNAs) are the largest class of small non-coding RNAs discovered in germ cells. Identifying piRNAs from small RNA data is a challenging task due to the lack of conserved sequences and structural features of piRNAs. Many programs have been developed to identify piRNA from small RNA data. However, these programs have limitations. They either rely on extracting complicated features, or only demonstrate strong performance on transposon related piRNAs. Here we proposed a new program called piRNN for piRNA identification. For our software, we applied a convolutional neural network classifier that was trained on the datasets from four different species (Caenorhabditis elegans, Drosophila melanogaster, rat and human). A matrix of k-mer frequency values was used to represent each sequence. piRNN has great usability and shows better performance in comparison with other programs. It is freely available at https://github.com/bioinfolabmu/piRNN.Entities:
Keywords: Convolution neural network; Deep learning; piRNA
Year: 2018 PMID: 30083483 PMCID: PMC6078063 DOI: 10.7717/peerj.5429
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Feature extraction.
(A) Sequence features consist of two parts. The first part is the k-mer motifs of the whole sequence. The second part k-mer motifs are around the first T/U and/or 10th A. If a sequence dose not start with a T/U or do not have a 10th A, the corresponding second part will not be calculated. (B) The 1,364 vectors are transformed into this matrix.
Figure 2Architecture of the convolution neural network.
Figure 3piRNN performance with different k value for different species.
(A), (B), (C), and (D) show the results of C. elegans, D. melanogaster, rat, and human, respectively.
Comparison between Piano, piRNApredictor, 2L-piRNA, and piRNN.
The results of piRNN are highlighted in bold.
| Program | Species | ACC | MCC | |||
|---|---|---|---|---|---|---|
| Piano | Fruit fly | 0.68 ± 0.021 | 0.63 ± 0.018 | 0.87 ± 0.021 | 0.50 ± 0.034 | 0.40 ± 0.043 |
| Human | 0.62 ± 0.013 | 0.58 ± 0.015 | 0.92 ± 0.008 | 0.32 ± 0.016 | 0.30 ± 0.02 | |
| piRNApredictor | Fruit fly | 0.53 ± 0.013 | 0.66 ± 0.066 | 0.14 ± 0.025 | 0.93 ± 0.012 | 0.11 ± 0.043 |
| Human | 0.72 ± 0.019 | 0.84 ± 0.019 | 0.55 ± 0.04 | 0.89 ± 0.014 | 0.47 ± 0.032 | |
| 2L-piRNA | Fruit fly | 0.52 ± 0.027 | 0.65 ± 0.039 | 0.39 ± 0.035 | 0.71 ± 0.30 | 0.10 ± 0.051 |
| Human | 0.67 ± 0.028 | 0.68 ± 0.031 | 0.79 ± 0.025 | 0.51 ± 0.042 | 0.31 ± 0.055 | |
| piRNN | Fruit fly | |||||
| Human |
Note:
ACC, Accuracy; Pre, precision; Sn, sensitivity; Sp, specificity; MCC, Matthews correlation coefficient.