| Literature DB >> 27600081 |
Margherita Squillario1, Matteo Barbieri2, Alessandro Verri3, Annalisa Barla4.
Abstract
Biological interpretability is a key requirement for the output of microarray data analysis pipelines. The most used pipeline first identifies a gene signature from the acquired measurements and then uses gene enrichment analysis as a tool for functionally characterizing the obtained results. Recently Knowledge Driven Variable Selection (KDVS), an alternative approach which performs both steps at the same time, has been proposed. In this paper, we assess the effectiveness of KDVS against standard approaches on a Parkinson's Disease (PD) dataset. The presented quantitative analysis is made possible by the construction of a reference list of genes and gene groups associated to PD. Our work shows that KDVS is much more effective than the standard approach in enhancing the interpretability of the obtained results.Entities:
Keywords: KDVS; Parkinson’s disease; established domain knowledge; functional characterization; gene expression; gene ontology; sparse regularization; variable selection
Year: 2016 PMID: 27600081 PMCID: PMC5003491 DOI: 10.3390/microarrays5020015
Source DB: PubMed Journal: Microarrays (Basel) ISSN: 2076-3905
Figure 1Knowledge Driven Variable Selection (KDVS) and standard pipelines. KDVS embeds the Gene Ontology (GO) domain knowledge into the variable selection step, providing as output a list of discriminant GO terms and genes. The standard pipeline, instead, first selects a gene signature and then performs an enrichment analysis in GO obtaining a discriminant GO term list.
Figure 2This scheme shows the workflow used to obtain the benchmark gene and GO terms lists. The benchmark gene list is composed of 444 genes and the benchmark GO term list is composed of 2121 terms: 1447 from Biological Process (BP), 446 from Molecular Function (MF) and 228 from Cellular Component (CC).
Top performing methods for the standard pipeline. For each method, the average test error, standard deviation (SD), and MCC are reported.
| Experiment | Test Error ± SD (%) | MCC |
|---|---|---|
|
| 23.1 ± 8.6 | 0.54 |
|
| 22.0 ± 9.7 | 0.56 |
|
| 22.0 ± 8.2 | 0.56 |
|
| 24.6 ± 7.1 | 0.51 |
Selection performance of Knowledge Driven Variable Selection (KDVS) and five different instances of the standard pipeline vs. the benchmark. Precision, Recall and F-measure are reported for KDVS, the best four methods of Table 1 and the t-test for GO terms and genes.
| GO Terms | Genes | |||||
|---|---|---|---|---|---|---|
| Experiments | Precision (%) | Recall (%) | F-measure ( | Precision (%) | Recall (%) | F-measure ( |
|
| 44.0 | 12.7 |
| 7.5 | 25.5 |
|
|
| 71.4 | 0.2 | 4.8 | 10.4 | 1.1 | 20.4 |
|
| 50.0 | 0.1 | 1.0 | 3.5 | 0.5 | 8.0 |
|
| 50.0 | 0.1 | 2.8 | 18.8 | 0.7 | 13.1 |
|
| 62.5 | 0.2 | 4.8 | 16.7 | 0.9 | 17.1 |
|
| 50.0 | 0.1 | 1.0 | 2.5 | 0.2 | 4.2 |
Figure 3ROC curves for the three GO domains. The plots show the ROC curves (Sensitivity vs. Specificity, defined as ) for the KDVS GO terms, for varying values of the threshold error. The highlighted point on the curve is associated with the highest F-measure, reported in the green box.