| Literature DB >> 20975910 |
Amit Kumar Banerjee1, Sunita M, Naveen M, Upadhyayula Suryanarayana Murty.
Abstract
Biological systems are highly organized and enormously coordinated maintaining greater complexity. The increment of secondary data generation and progress of modern mining techniques provided us an opportunity to discover hidden intra and inter relations among these non linear dataset. This will help in understanding the complex biological phenomenon with greater efficiency. In this paper we report comparative classification of Pyruvate Dehydrogenase protein sequences from bacterial sources based on 28 different physicochemical parameters (such as bulkiness, hydrophobicity, total positively and negatively charged residues, α helices, β strand etc.) and 20 type amino acid compositions. Logistic, MLP (Multi Layer Perceptron), SMO (Sequential Minimal Optimization), RBFN (Radial Basis Function Network) and SL (simple logistic) methods were compared in this study. MLP was found to be the best method with maximum average accuracy of 88.20%. Same dataset was subjected for clustering using 2*2 grid of a two dimensional SOM (Self Organizing Maps). Clustering analysis revealed the proximity of the unannotated sequences with the Mycobacterium and Synechococcus genus.Entities:
Keywords: Clustering; Data Mining; KNIME; Pyruvate Dehydrogenase; Self Organizing Maps (SOM)
Year: 2010 PMID: 20975910 PMCID: PMC2951700 DOI: 10.6026/97320630004456
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Workflow representation of the whole study
Figure 2Average accuracy achieved versus methodology adopted
Figure 3Representation of highest accuracy obtained with respective methodology adopted and dataset used
Figure 4Classification efficiency of different methods with respect to dataset
Figure 5Distribution of sequences considered in this study according to their cluster formation
Figure 8Visual representation of cluster 2, 1
Figure 6Visual representation of cluster 1, 1
Figure 7Visual representation of cluster 1, 2
Figure 9Visual representation of cluster 2, 2