| Literature DB >> 27918472 |
Qaisar Abbas1, Syed Mansoor Raza2, Azizuddin Ahmed Biyabani3, Muhammad Arfan Jaffar4.
Abstract
Finding non-coding RNA (ncRNA) genes has emerged over the past few years as a cutting-edge trend in bioinformatics. There are numerous computational intelligence (CI) challenges in the annotation and interpretation of ncRNAs because it requires a domain-related expert knowledge in CI techniques. Moreover, there are many classes predicted yet not experimentally verified by researchers. Recently, researchers have applied many CI methods to predict the classes of ncRNAs. However, the diverse CI approaches lack a definitive classification framework to take advantage of past studies. A few review papers have attempted to summarize CI approaches, but focused on the particular methodological viewpoints. Accordingly, in this article, we summarize in greater detail than previously available, the CI techniques for finding ncRNAs genes. We differentiate from the existing bodies of research and discuss concisely the technical merits of various techniques. Lastly, we review the limitations of ncRNA gene-finding CI methods with a point-of-view towards the development of new computational tools.Entities:
Keywords: Bayesian networks; DNA; computational intelligence; deep learning; gene; genetic algorithm; micro RNA; neural network; non-coding RNA; support vector machine
Year: 2016 PMID: 27918472 PMCID: PMC5192489 DOI: 10.3390/genes7120113
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1An example of the transcription process to produce protein with coding and non-coding RNA genes.
State-of-the-art computational intelligence (CI) techniques for finding non-coding RNA (ncRNA) genes from 2001 to 2016.
| Cited | Approach | 4 CMP | Results (%) | Methodology and Online Resource Tools |
|---|---|---|---|---|
| [ | A computational approach to identify genes for functional RNAs in genomic sequences. | √ | 2 S: 90%, 3 P: 99% | NN and SVM. Online tool unavailable. |
| [ | To detect ncRNA sequences. | × | −−−−−− | The support vector machine (SVM) algorithm was implemented in graphical processing units (GPUs) based parallel technology. Online tool unavailable. |
| [ | To differentiate between well-known classes and target predicted classes of messenger RNA (mRNA). | √ | −−−−−− | A new web-based interface was developed to detect ncRNAs. Available at |
| [ | To identify ncRNA a positive sample only learning algorithm is introduced. | × | 1 A: 80% | The SVM used as the core learning machine assessed by 5-fold-validation in recovery of known ncRNA. Data available online at ( |
| [ | To introduce a method to differentiate between coding or non-coding RNA. | × | 3 P: 97%, 2 S: 98% | Supervised machine learning SVM is used to classify transcripts according to features they would have if transcripts coded for proteins. Online data source of mRNA at: RNAdb ( |
| [ | To identify ncRNA using six features extracted from transcript’s nucleotide sequence. | × | −−−−−− | SVM (coding potential calculator ((CPC)) to identify ncRNA using six features extracted from transcript’s nucleotide sequence. Dataset used Rfam and RNAdb for noncoding and EMBL CDS for coding. Online web-based interface available of CPC at |
| [ | The prediction of ncRNA genes using boosted genetic programming. | × | 1 A: 80% | The GA and 10-fold cross validation was used to train and test the learning machine. Online tool unavailable. |
| [ | To classify micro RNAs (miRNAs) and to differentiate between normal and tumor tissues. | √ | −−−−−− | A multi-objective algorithm was developed by using four classifiers such as random tree (RT), random forest (RF), sequential minimal optimization (SMO) and logistic regression (LR). |
| [ | To automatically predict miRNA target. | √ | F-measure: 0.95 | The deep neural-network (DNN) was utilized to increase F-measure by 25% for prediction of miRNA targets. Available at ( |
| [ | To predict miRNAs targets. | × | 1 A: 90%, 2 S: 88%, 3 P: 94% | Contrast relaxing and convolutional neural network (CNN) methods. Online tool unavailable. |
| [ | To predict new miRNA, known as pre-miRNAs. | × | 1 A: 99.9%, 2 S: 99.8%, 3 P: 100% | A neural networks (NNs) classifier was used to predict miRNA. Online tool unavailable. |
| [ | To improve the performance and to predict the regulation of miRNA. | × | −−−−−−−− | The authors utilized a NNs classifier to predict miRNA. Online tool unavailable. |
| [ | To predict a real pre-miRNA or a pseudo pre-miRNA. | √ | 1 S: 97.40%, 2 P: 95.85% | The authors utilized a multilayer artificial neural network (ANN) classifier. Online tool unavailable. |
| [ | A de novo prediction algorithm to identify ncRNA using features derived from sequence and structure of known ncRNA. | × | 2 S: 68%, 3 P: 70%, 1 A: 70% | NN-based meta-learner de novo predictor using folding, ensemble, and structure-based features. Online data and program found at: |
| [ | The 15 disease related ncRNAs sequences are utilized from the ncRNAs with Alzheimer disease. | × | −−−−−− | From the NONCODE database [ |
| [ | To identify ncRNA genes using a genetic algorithm (GA). | × | −−−−−− | The observed sequence in real sequence data is used to motivate the use of GAs to quickly reject regions of the search space of ncRNAs. Online tool unavailable. |
| [ | To identify ncRNA using covariance searching. | × | −−−−−− | The covariance models for ncRNA gene finding is extremely powerful and also extremely computationally demanding. Online tool unavailable. |
| [ | A comparative genomic approach is used to detect ncRNA. | × | −−−−−− | Developed an efficient clustering method for finding potential ncRNAs in bacteria by clustering genomic sequences. Online tool unavailable. |
| [ | To identify real and pseudo miRNA using SVM with features that are present in local structure-sequence. | × | 1 A: 90% | A method to classify real and pseudo miRNA by applying SVM using local structure sequence features. Online tool unavailable. |
| [ | Computational identification of ncRNAs in
| × | −−−−−− | Computational screen followed by Northern blot and transcript sequencing. Online tool unavailable. Data set is available only at: |
| [ | Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection. | × | −−−−−− | The authors identified nine ncRNAs. Online tool unavailable. Data set is available only at: |
| [ | The 19 candidate ncRNAs were identified including one with significant homology. | × | −−−−−− | The author used base-composition statistics method to find variety of ncRNAs. Online tool unavailable. |
| [ | ncRNA gene detection using comparative sequence analysis. | √ | 2 S: 97.3%, 3 P: 100% | Comparative sequence analysis algorithm with “pair grammars” based on stochastic and hidden Markov models (HMM). Online tool unavailable. |
1 A: Accuracy, 2 S: Sensitivity, 3 P: Specificity, and 4 CMP: Comparisons, √: Compared and ×: Not compared.
A brief summary of CI techniques with respect to classification algorithms.
| Year | Computational Intelligence |
|---|---|
| 2016 | Multi classifiers (RT, RF, SMO) and Logic Regression LRDNN |
| 2015 | CNN, SVM, NN |
| 2012 | NN |
| 2009 | ANN, De novo NN, Hybrid Methods (HMs) |
| 2008 | Z-curve, GA |
| 2007 | SVM-Coding |
| 2006 | SVM and Covariance model parameter estimation |
| 2005 | GA, SVM, HMMs |
| 2002 | Local base-composition statistics |
| 2001 | Single-hidden layer NNs and SVMs, Comparative sequence analysis algorithm based on HMMs |
State-of-the-art CI online databases for the development of CI techniques.
| Cited | Databases | Web-Links |
|---|---|---|
| [ | RNALOSS | |
| [ | RNAdb | |
| [ | NONCODE | |
| [ | Rfam | |
| [ | RSEARCH | |
| [ | EICO |
Figure 2Computational techniques (percentage) used since 2001. MOA: Massive online analysis, DNN: Deep neural network, CNN: Convolutional neural network, SVM: Support vector machine, NNs: Neural networks, ANN: Artificial neural Networks, GA: Genetic algorithm, HMMs: Hidden Markov Model.