| Literature DB >> 31388036 |
Alok Sharma1,2,3,4, Edwin Vans5,6, Daichi Shigemizu7,8,9,10, Keith A Boroevich7, Tatsuhiko Tsunoda11,12,13,14.
Abstract
It is critical, but difficult, to catch the small variation in genomic or other kinds of data that differentiates phenotypes or categories. A plethora of data is available, but the information from its genes or elements is spread over arbitrarily, making it challenging to extract relevant details for identification. However, an arrangement of similar genes into clusters makes these differences more accessible and allows for robust identification of hidden mechanisms (e.g. pathways) than dealing with elements individually. Here we propose, DeepInsight, which converts non-image samples into a well-organized image-form. Thereby, the power of convolution neural network (CNN), including GPU utilization, can be realized for non-image samples. Furthermore, DeepInsight enables feature extraction through the application of CNN for non-image samples to seize imperative information and shown promising results. To our knowledge, this is the first work to apply CNN simultaneously on different kinds of non-image datasets: RNA-seq, vowels, text, and artificial.Entities:
Year: 2019 PMID: 31388036 PMCID: PMC6684600 DOI: 10.1038/s41598-019-47765-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of Datasets.
| Datasets | #samples | #features | #classes |
|---|---|---|---|
| RNA-seq | 6216 | 60483 | 10 |
| Vowels | 12579 | 39 | 10 |
| Relathe | 1427 | 4322 | 2 |
| Madelon | 2600 | 500 | 2 |
| Ringnorm-DELVE | 7400 | 20 | 2 |
Classification accuracy on different kinds of datasets using various models.
| Datasets | Decision Tree | Ada-Boost | Random Forest | DeepInsight |
|---|---|---|---|---|
| RNA-seq | 85% | 84% | 96% |
|
| Vowels | 75% | 45% | 90% |
|
| Text | 87% | 85% | 90% |
|
| Artificial (Madelon) | 65% | 60% | 62% |
|
| Artificial (Ringnorm-DELVE) | 90% | 93% | 94% |
|
|
|
|
|
|
|
Figure 1DeepInsight pipeline. (a) An illustration of transformation from feature vector to feature matrix. (b) An illustration of the DeepInsight methodology to transform a feature vector to image pixels.
Figure 2DeepInsight network: an illustration. (a) Illustration of two types of tumors using the image transformation methodology of the DeepInsight method. The difference between the two types can be visualized at various points. These image samples are further processed to deep learning architecture (DLA); i.e., parallel CNN as depicted in part b of the same figure. (b) Parallel CNN architecture used in DeepInsight. This architecture consists of two parallel CNN architectures where each consists of four convolutional layers. Parameters are tuned using Bayesian Optimization technique.
Figure 3Revealing patterns by DeepInsight. An illustration showing the different patterns achieved by DeepInsight on gene-expression (different kind of cancers), text (two types of text) and vowels (two types of vowels). Each plot shows a transformed sample, the difference between samples can now be noticed straightforwardly.