| Literature DB >> 18317579 |
Pradeep Kumar Naik1, Vinay Kumar Mittal, Sumit Gupta.
Abstract
The problem of predicting non-long terminal repeats (LTR) like long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) from the DNA sequence is still an open problem in bioinformatics. To elevate the quality of annotations of LINES and SINEs an automated tool "RetroPred" was developed. The pipeline allowed rapid and thorough annotation of non-LTR retrotransposons. The non-LTR retrotransposable elements were initially predicted by Pairwise Aligner for Long Sequences (PALS) and Parsimonious Inference of a Library of Elementary Repeats (PILER). Predicted non-LTR elements were automatically classified into LINEs and SINEs using ANN based on the position specific probability matrix (PSPM) generated by Multiple EM for Motif Elicitation (MEME). The ANN model revealed a superior model (accuracy = 78.79 +/- 6.86 %, Q(pred) = 74.734 +/- 17.08 %, sensitivity = 84.48 +/- 6.73 %, specificity = 77.13 +/- 13.39 %) using four-fold cross validation. As proof of principle, we have thoroughly annotated the location of LINEs and SINEs in rice and Arabidopsis genome using the tool and is proved to be very useful with good accuracy. Our tool is accessible at http://www.juit.ac.in/RepeatPred/home.html.Entities:
Keywords: LINEs; SINEs; artificial neural network; classification; non-LTR retrotransposons; prediction
Year: 2008 PMID: 18317579 PMCID: PMC2258426 DOI: 10.6026/97320630002263
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1The flow diagram used for identification and configuration of artificial neural network (ANN) for classification of predicted non-LTR retrotransposons into LINEs and SINEs.
Figure 2The steps followed for generation of position specific probability matrix (PSPM) of the datasets from three different sources using Multiple EM for Motif Elicitation (MEME).
Figure 3Graphical output of the program detecting the location of LINEs and SINEs on the chromosome. The red regions represent the location of SINEs and green region represent the LINEs in the chromosomal DNA. The position of the SINEs and LINEs are in the unit of mega basepair (Mbp).