| Literature DB >> 27257410 |
Mostafa A Salama1, Aboul Ella Hassanien2, Ahmad Mostafa3.
Abstract
Viral evolution remains to be a main obstacle in the effectiveness of antiviral treatments. The ability to predict this evolution will help in the early detection of drug-resistant strains and will potentially facilitate the design of more efficient antiviral treatments. Various tools has been utilized in genome studies to achieve this goal. One of these tools is machine learning, which facilitates the study of structure-activity relationships, secondary and tertiary structure evolution prediction, and sequence error correction. This work proposes a novel machine learning technique for the prediction of the possible point mutations that appear on alignments of primary RNA sequence structure. It predicts the genotype of each nucleotide in the RNA sequence, and proves that a nucleotide in an RNA sequence changes based on the other nucleotides in the sequence. Neural networks technique is utilized in order to predict new strains, then a rough set theory based algorithm is introduced to extract these point mutation patterns. This algorithm is applied on a number of aligned RNA isolates time-series species of the Newcastle virus. Two different data sets from two sources are used in the validation of these techniques. The results show that the accuracy of this technique in predicting the nucleotides in the new generation is as high as 75 %. The mutation rules are visualized for the analysis of the correlation between different nucleotides in the same RNA sequence.Entities:
Keywords: Gene prediction; Machine learning; RNA
Year: 2016 PMID: 27257410 PMCID: PMC4867776 DOI: 10.1186/s13637-016-0042-0
Source DB: PubMed Journal: EURASIP J Bioinform Syst Biol ISSN: 1687-4145
Fig. 1The learning of the neural network from the input data set
Fig. 2Nucleotide i for iteration i in the proposed algorithm, nucleotide as position i is the same, not changed
Fig. 3Nucleotide i for iteration i in the proposed algorithm, nucleotide as position i is the not the same, changed
Fig. 4Aligned gene sequence of nucleotides
Fig. 5Neural network classification results
AB genotype rules for the Chinese data set
| Nucleotide | Predicted | Rule |
|
|---|---|---|---|
| position | genotype | ||
|
| T |
| CCCCCCCCCCCCCCCCCCCCCCCCCCTCCCCCCCCCCCCCCCCCC |
|
| T |
| GGTGGGTGGGGAGGAGGGGGGGGAGGAAAAAAAAAAGAAAGAA |
|
| G |
| GGGGGGGGGGGGGGGGGGGGGGGGGGGTGGGGGGGGGGGGGG |
| T |
| ||
|
| C |
| GGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGG |
| G |
| ||
|
| G |
| AAAGGAAAAAAAAAAGAACAAAAAAAAAAAAAAAAAAAAAAAAA |
Fig. 6Nucleotides correlation in China data set
Fig. 7Nucleotides correlation in Korean data set
Fig. 8Prediction accuracy for Korean and Chinese data sets