| Literature DB >> 29149087 |
Saed Khawaldeh1,2,3,4,5, Usama Pervaiz6,7,8, Mohammed Elsharnoby9, Alaa Eddin Alchalabi10, Nayel Al-Zubi11.
Abstract
Taxonomic classification has a wide-range of applications such as finding out more about evolutionary history. Compared to the estimated number of organisms that nature harbors, humanity does not have a thorough comprehension of to which specific classes they belong. The classification of living organisms can be done in many machine learning techniques. However, in this study, this is performed using convolutional neural networks. Moreover, a DNA encoding technique is incorporated in the algorithm to increase performance and avoid misclassifications. The algorithm proposed outperformed the state of the art algorithms in terms of accuracy and sensitivity, which illustrates a high potential for using it in many other applications in genome analysis.Entities:
Keywords: DNA; convolutional neural networks; encoding; genes; taxonomic classification
Year: 2017 PMID: 29149087 PMCID: PMC5704239 DOI: 10.3390/genes8110326
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Taxonomy Hierarchy of Modern Human.
Figure 2Samples represent DNA sequences for Cytochrome C in human, chimpanzee and mouse.
Figure 3Illustration of the convoluted neural network (CNN) model used.
Data preprocessing command-line parameters.
| This inputs the csv format file. The input file at this stage is in | |
| This outputs the t7b format file. The output file at this stage is in torch-7-binary format (t7b). These generated files are directly used in training and validating datasets for the Crepe training program component. |
Members of the variable train.
| A | |
| A | |
| A |
Training programs for Crepe.
| A unified file for all configurations for the dataset, model, trainer, tester and GUI | |
| Provides a Data class; both training and validating datasets are instances of this class | |
| The main driver program | |
| Provides a Model class; it handles model creation, randomization and transformations during training | |
| Provides a Mui class; uses the Scroll class to draw an nn.Sequential model in Qt | |
| Provides a Scroll class that starts a scrollable Qt window to draw text or images | |
| A Qt designer UI file corresponding to the scrollable Qt window | |
| Provides a Test class; handles testing, giving you losses, errors and confusion matrices | |
| Provides a Train class; handles training with Stochastic Gradient Descent (SGD) and supports things like momentum and weight decay |
Labels used for different classes in the dataset.
| Order Name | Label |
|---|---|
| Chaetothyriomycetes | 1 |
| Diptera | 2 |
| Echinoida | 3 |
| Forcipulatida | 4 |
| Lepidoptera | 5 |
| Onygenales | 6 |
| Pezizomycetes | 7 |
| Scleractinia | 8 |
| Valvatida | 9 |
Figure 4Encoded DNA sample from Class 9.
Figure 5Confusion Matrix.
Figure A1Confusion matrix, precision, and recall values for all classes.