Literature DB >> 15032659

Machine learning based pattern recognition applied to microarray data.

Barry K Lavine1, Charles E Davidson, William S Rayens.   

Abstract

MOTIVATION: Microarrays have allowed the expression level of thousands of genes or proteins to be measured simultaneously. Data sets generated by these arrays consist of a small number of observations (e.g., 20-100 samples) on a very large number of variables (e.g., 10,000 genes or proteins). The observations in these data sets often have other attributes associated with them such as a class label denoting the pathology of the subject. Finding the genes or proteins that are correlated to these attributes is often a difficult task since most of the variables do not contain information about the pathology and as such can mask the identity of the relevant features. We describe a genetic algorithm (GA) that employs both supervised and unsupervised learning to mine gene expression and proteomic data. The pattern recognition GA selects features that increase clustering, while simultaneously searching for features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the features chosen by the GA contain information primarily about differences between classes in the data set. The principal component analysis routine embedded in the fitness function of the GA acts as an information filter, significantly reducing the size of the search space since it restricts the search to feature sets whose principal component plots show clustering on the basis of class. The algorithm integrates aspects of artificial intelligence and evolutionary computations to yield a smart one pass procedure for feature selection, clustering, classification, and prediction.

Mesh:

Year:  2004        PMID: 15032659     DOI: 10.2174/138620704773120801

Source DB:  PubMed          Journal:  Comb Chem High Throughput Screen        ISSN: 1386-2073            Impact factor:   1.339


  2 in total

1.  Using OrPLS to Identify Asymptomatic Women at Risk For Alzheimer's Disease.

Authors:  William Rayens; Yushu Liu; Anders Andersen; Charles Smith
Journal:  J Chemother       Date:  2008-09       Impact factor: 1.714

2.  Establishing glucose- and ABA-regulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine.

Authors:  Yunhai Li; Kee Khoon Lee; Sean Walsh; Caroline Smith; Sophie Hadingham; Karim Sorefan; Gavin Cawley; Michael W Bevan
Journal:  Genome Res       Date:  2006-01-19       Impact factor: 9.043

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.