| Literature DB >> 19178723 |
Daniel Glez-Peña1, Rodrigo Alvarez, Fernando Díaz, Florentino Fdez-Riverola.
Abstract
BACKGROUND: Expression profiling assays done by using DNA microarray technology generate enormous data sets that are not amenable to simple analysis. The greatest challenge in maximizing the use of this huge amount of data is to develop algorithms to interpret and interconnect results from different genes under different conditions. In this context, fuzzy logic can provide a systematic and unbiased way to both (i) find biologically significant insights relating to meaningful genes, thereby removing the need for expert knowledge in preliminary steps of microarray data analyses and (ii) reduce the cost and complexity of later applied machine learning techniques being able to achieve interpretable models.Entities:
Mesh:
Year: 2009 PMID: 19178723 PMCID: PMC2637236 DOI: 10.1186/1471-2105-10-37
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparative analysis of R-based methods for gene selection
| Method | Bayesian model averaging (BMA) approach over the underlying classification model (logistic regression) | varSelRF uses the measures of variable importance (related to the classification) provided directly by the Random Forest algorithm | R-SVM uses a contribution factor of each feature (computed from the weights of the SVM classifier) | t-test | The selected genes are based on the induced fuzzy pattern for each class |
| Type of classification | Multiclass | Multiclass | Binary classifications | Binary classifications | Multiclass |
| Dependence among features | Multivariate | Multivariate | Multivariate | Univariate | Univariate |
| Remarks | The method facilitates biological interpretation by producing posterior probabilities of selected genes and models. BMA accounts for the uncertainty about the best set to choose by averaging over multiple models | The method does not require pre-specify the number of genes to be selected, but rather adaptively chooses the number of genes | The algorithm is based on the repeated application of the SVM classifier over progressively smaller sets of genes (where genes are excluded according to the defined contribution factor) until a satisfactory solution is achieved. The number of iterations and the number of features to be selected in each iteration are very | The computational effort is smaller than multivariate methods | It does not require any assumption about the distribution of the expression levels and |
Figure 1Shape of membership function for a specific gene and possible assigned labels given a threshold θ = 0.7. The centre and amplitude of each membership function depend on the mean and on the variability of the available data respectively. The Medium membership function is considered symmetric whereas the Low and High functions are asymmetric in the extremes.
Figure 2Membership functions belonging to the first two genes. Vertical lines show the expression values corresponding to each microarray sample.
Figure 3DFP of selected genes (in rows) with its appearance frequency for each category (in columns). In the first table, a NA value is assigned if the frequency of appearance is lower or equal than the piVal parameter, meaning that this gene does not belong to the FP of this category.