| Literature DB >> 23634293 |
Matilde L Sánchez-Peña1, Clara E Isaza, Jaileene Pérez-Morales, Cristina Rodríguez-Padilla, José M Castro, Mauricio Cabrera-Ríos.
Abstract
Microarray experiments are capable of determining the relative expression of tens of thousands of genes simultaneously, thus resulting in very large databases. The analysis of these databases and the extraction of biologically relevant knowledge from them are challenging tasks. The identification of potential cancer biomarker genes is one of the most important aims for microarray analysis and, as such, has been widely targeted in the literature. However, identifying a set of these genes consistently across different experiments, researches, microarray platforms, or cancer types is still an elusive endeavor. Besides the inherent difficulty of the large and nonconstant variability in these experiments and the incommensurability between different microarray technologies, there is the issue of the users having to adjust a series of parameters that significantly affect the outcome of the analyses and that do not have a biological or medical meaning. In this study, the identification of potential cancer biomarkers from microarray data is casted as a multiple criteria optimization (MCO) problem. The efficient solutions to this problem, found here through data envelopment analysis (DEA), are associated to genes that are proposed as potential cancer biomarkers. The method does not require any parameter adjustment by the user, and thus fosters repeatability. The approach also allows the analysis of different microarray experiments, microarray platforms, and cancer types simultaneously. The results include the analysis of three publicly available microarray databases related to cervix cancer. This study points to the feasibility of modeling the selection of potential cancer biomarkers from microarray data as an MCO problem and solve it using DEA. Using MCO entails a new optic to the identification of potential cancer biomarkers as it does not require the definition of a threshold value to establish significance for a particular gene and the selection of a normalization procedure to compare different experiments is no longer necessary.Entities:
Keywords: Cancer biomarkers; cervical cancer; data envelopment analysis; microarray data analysis; multiple criteria optimization
Mesh:
Substances:
Year: 2013 PMID: 23634293 PMCID: PMC3639664 DOI: 10.1002/cam4.69
Source DB: PubMed Journal: Cancer Med ISSN: 2045-7634 Impact factor: 4.452
Figure 1Schematic example of how to obtain a P-value. This is a schematic example of how to obtain one P-value for a particular gene in a microarray experiment with l = 3 healthy tissues as controls and m = 3 tissues with cancer. If statistical comparison is carried out for each gene, then at the end one has n genes each one with an associated P-value.
Figure 2Pareto-efficient frontier. The existence of conflict causes that different genes be attractive when lying in the southwest envelope of the gene set. In general, in multiple criteria optimization (MCO), that envelope is called a Pareto-efficient frontier and it is conformed by Pareto-efficient solutions.
Figure 3The two performance measures for each gene. This figure schematically shows a case with genes characterized by two performance measures: an untransformed P-value and a transformed one with equation (1). Referring to this figure, and following the proposed method, at this point it is recommended to identify the first 10 efficient frontiers. This can be easily done by identifying the genes in the first efficient frontier through data envelopment analysis (DEA), then removing them from the set and continuing with a second DEA iteration. This is repeated until the tenth frontier is identified. A method to determine the number of adequate frontiers to be analyzed is currently under development by our research group.
List of the 28 genes identified in the first 10 frontiers of the proposed multiple criteria optimization (MCO) problem
| Frontier | Accession number | Symbol | Name | Expression in cervix cancer (using data from Wong et al. |
|---|---|---|---|---|
| 1 | AA488645 | NGFI-A-binding protein 1 (EGR1 binding protein 1) | Underexpressed | |
| 2 | H22826 | LIM domain 7 | Overexpressed | |
| 3 | AI553969 | Karyopherin α6 (importin α7) | Overexpressed | |
| 3 | T71316 | ADP-ribosylation factor 4 | Overexpressed | |
| 3 | AA243749 | Discoidin domain receptor tyrosine kinase 2 | Overexpressed | |
| 3 | AA460827 | Protein phosphatase 1, regulatory (inhibitor) subunit 1A | Underexpressed | |
| 4 | AA454831 | EST: zx79c10.s1 | Overexpressed | |
| 4 | AA913408, AA913864 | DNA damage repair and recombination protein RAD52 pseudogene | Overexpressed | |
| 5 | AA487237 | Ubiquitin protein ligase E3A | Underexpressed | |
| 5 | AA446565 | RNA-binding motif protein 25 | Overexpressed | |
| 6 | H23187 | Carbonic anhydrase II | Overexpressed | |
| 7 | AI221445 | Potassium voltage-gated channel, Isk-related family, member 3 | Overexpressed | |
| 7 | R36086 | EST: yh88d01.s1 | Underexpressed | |
| 7 | AA282537 | Hypothetical protein LOC729991 | Overexpressed | |
| 8 | N93686 | Aldehyde dehydrogenase 3 family, member B1 | Underexpressed | |
| 8 | R91078 | Cytochrome P450, family 3, subfamily A, polypeptide 7 | Overexpressed | |
| 8 | R44822 | Phosphoribosyl pyrophosphate synthetase-associated protein 1 | Underexpressed | |
| 9 | AI334914 | Integrin, alpha 2b (platelet glycoprotein IIb of IIb/IIIa complex, antigen CD41) | Overexpressed | |
| 9 | R93394 | Transcribed locus | Overexpressed | |
| 9 | AA621155 | MutS homolog 5 ( | Underexpressed | |
| 9 | AA705112 | Molybdenum cofactor synthesis 1 | Overexpressed | |
| 9 | R52794 | Protein tyrosine phosphatase, receptor type, T | Underexpressed | |
| 10 | AA424344 | Uroporphyrinogen decarboxylase | Overexpressed | |
| 10 | H69876 | Hypothetical LOC100132707 | Underexpressed | |
| 10 | H55909 | Serine/arginine-rich splicing factor 1 | Underexpressed | |
| 10 | W74657 | Kruppel-like factor 2 (lung) | Overexpressed | |
| 10 | AI017398 | Amiloride-sensitive cation channel 2, neuronal | Overexpressed | |
| 10 | H99699 | Polymerase (RNA) III (DNA directed) polypeptide H (22.9 kD) | Overexpressed |
The table shows complete list of genes identified in the first 10 efficient frontiers. In the last column, the expression change from the normal state to the cancer state is shown.
List of genes from the cross-validation study