| Literature DB >> 23302187 |
Grzegorz Zycinski1, Annalisa Barla, Margherita Squillario, Tiziana Sanavia, Barbara Di Camillo, Alessandro Verri.
Abstract
BACKGROUND: High-throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score-based or requires tunable parameters as well, limiting its power.Entities:
Year: 2013 PMID: 23302187 PMCID: PMC3605163 DOI: 10.1186/1751-0473-8-2
Source DB: PubMed Journal: Source Code Biol Med ISSN: 1751-0473
Figure 1The difference between classic approach and the KDVS approach. Illustration of the difference between classic enrichment–based approach and the KDVS framework approach. In classic, monolithic gene expression data are mined for significant genes and prior biological knowledge is used a posteriori to verify the soundness of result. This approach is sensitive to the choice of mining technique, as well as the enrichment verification method. In KDVS, monolithic gene expression data are transformed according to prior knowledge (e.g. divided in smaller parts accordingly), and then mined for significant genes. This approach enables wider choice of mining techniques and provides biological insight before the core mining step.
Figure 2General activity schema of KDVS. Diagram of the general activity of KDVS.
Figure 3Schema of the architecture of KDVS. KDVS consists of applications that work on common ensemble of data. Each application uses the KDVS API component.
Figure 4Activity schema of experiment.py application. The experiment.py application creates ensemble of data, performs data integration and transformation, manages distributed computational environment and performs knowledge discovery procedure.
Figure 5Activity schema of postprocess.py application. The postprocess.py application performs supplementary activities on ensemble of data (e.g. assembling final results, collecting useful statistics).
Results of the KDVS for Prostate cancer study
| | | | ||||
| 60 | 120 | 1115 | 2242 | 320 | 689 | |
| 59 | 389 | 3619 | 4457 | 3118 | 3271 | |
| 27 | 61 | 375 | 1000 | 158 | 378 | |
| 7 | 51 | 418 | 504 | 334 | 371 | |
| 1% | 2% | 46% | 41% | 19% | 16% | |
| 1% | 6% | 49% | 59% | 39% | 44% | |
In the first two rows, the table shows the discriminant gene and GO term lists identified by enrichment–based approach and KDVS for each classification problem solved: Primary Tumor (PT) versus Metastasis (M) and Primary Tumor vs Normal (N). For KDVS, in addition, each classification task was addressed two times, based on the GO domain utilized, Molecular Function (MF) or Biological Process (BP). Discr. stands for discriminant.
In the next two rows, the table shows the gene and GO term lists obtained as an intersection between respective discriminant lists and benchmark lists, built for prostate cancer. Comm. means common.
In the last two rows, the table shows the coverage of benchmark gene and GO term lists with respective discriminant lists. For KDVS, the coverage was calculated only for benchmark MF and BP terms, respectively, according to GO domain utilized in classification task. Benchmark gene list consists of 851 elements. Benchmark GO term list consists of 2437 BP terms and 824 MF terms, 3593 terms in total. Bench. cov. means benchmark coverage.
Results of the KDVS for Parkinson disease study I & II
| 77 | 65 | 364 | 150 | |
| 77 | 66 | 5705 | 4286 | |
| 31 | 31 | 113 | 54 | |
| 9 | 3 | 274 | 196 | |
| 2% | 1% | 25% | 12% | |
| 2% | 1% | 62% | 44% | |
In the first two rows, the table shows the discriminant gene and GO term lists identified by enrichment–based approach and KDVS for Parkinson study I and II, respectively. Both studies were performed on two different microarray datasets of Parkinson (PD) and Normal (N) tissue samples. Discr. means discriminant.
In the next two rows, the table shows the gene and GO term lists obtained as an intersection between respective discriminant lists and benchmark lists, built for Parkinson’s disease. Comm. means common.
In the last two rows, the table shows the coverage of benchmark gene and GO term lists with respective discriminant lists. For KDVS, the coverage was calculated only for benchmark MF terms, according to the nature of classification task. Benchmark gene list consists of 444 elements. Benchmark GO term list consists of 2121 terms in total, including 446 MF terms. Bench. cov. means benchmark coverage.