| Literature DB >> 27600224 |
Alina Sîrbu1, Martin Crane2, Heather J Ruskin3.
Abstract
Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions). Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.Entities:
Keywords: data integration; gene regulatory networks; microarrays; reverse engineering; transcriptional regulation
Year: 2015 PMID: 27600224 PMCID: PMC4996389 DOI: 10.3390/microarrays4020255
Source DB: PubMed Journal: Microarrays (Basel) ISSN: 2076-3905
Set of 27 genes selected for network analysis for the Drosophila melanogaster dataset.
| Gene names | ||||||||
|---|---|---|---|---|---|---|---|---|
| arm | bcd | cad | CrebA | Egfr | en | eve | ftz | fz |
| gt | hb | hkb | how | ken | Kr | L | Mef2 | mxc |
| noc | os | pnr | ras | smo | sna | Tl | tor | twi |
Usage of each dataset at the different analysis stages.
| Section | Analysis stage | Mechanism | SC | DC | BSA | KO | Corr | GO | DROID |
|---|---|---|---|---|---|---|---|---|---|
| Model extraction | Time series | ✓ | |||||||
| NSEx | ✓ | ✓ | ✓ | ✓ | |||||
| NSEv | |||||||||
| Model evaluation | Qualitative | ✓ | |||||||
| Quantitative | ✓ | ||||||||
| Model extraction | Time series | ✓ | |||||||
| NSEx | ✓ | ✓ | ✓ | ✓ | |||||
| NSEv | ✓ | ✓ | ✓ | ✓ | |||||
| Model evaluation | Qualitative | ✓ | |||||||
| Quantitative | ✓ | ||||||||
| Model extraction | Time series | ✓ | ✓ | ||||||
| NSEx | ✓ | ✓ | ✓ | ✓ | |||||
| NSEv | ✓ | ||||||||
| Model evaluation | Qualitative | ✓ | |||||||
| Quantitative |
Algorithm incorporating NSEx. Qualitative results: AUROC and AUPR values obtained after 10 runs with each algorithm and, in parentheses, standard deviations for subsets of 9 runs (see Section 2.3 for details on how these were computed). Variants: SC (SC time series only, without integration of additional data), SC+NSEx.KO (using knock-out experiments for NSEx), SC+NSEx.GO (using GO annotations for NSEx), SC+NSEx.BSA (using binding site affinities for NSEx), SC+NSEx.CORR (using gene-correlations for NSEx) and SC+NSEx.ALL (using all data for NSEx). For additional datasets, BSA followed by KO lead to improved sets of interactions, while CORR affects selection adversely. However, the combined effect of all data types provides optimal inference of the interaction set.
| Algorithm | SC | SC+NSEx.KO | SC+NSEx.GO | SC+NSEx.BSA | SC+NSEx.CORR | SC+NSEx.ALL |
|---|---|---|---|---|---|---|
| AUROC | 0.603 (0.017) | 0.610 (0.010) | 0.593 (0.022) | 0.677 (0.021) | 0.544 (0.016) | |
| AUPR | 0.037 (0.002) | 0.045 (0.003) | 0.034 (0.002) | 0.046 (0.003) | 0.036 (0.001) |
Figure 1Algorithm enhanced with NSEx: quantitative results. The graph shows the distribution (over 10 runs) of RMSE on test data (DC dataset) for models obtained with algorithm variants (as for Table 3). A t-test performed for each enhanced version to compare performance to that of the basic SC variant gave p-values as shown. No significant change was observed in RMSE values after integration.
Figure 2Algorithm enhanced with NSEx and NSEv: quantitative results compared to SC+NSEx and SC only. The variants are: SC (time series only, without integration of additional data), SC+NSEx.ALL (using all data for NSEx), SC+NSEx.ALL+NSEv.ALL (using all data for both NSEx and NSEv) and SC+NSEx.ALL+NSEv.BSA (using all data for NSEx, but BSA only for NSEv). RMSE values show improvement compared to the previous integration strategy; small differences between NSEv.ALL and NSEv.BSA are observed. This suggests that including all data in NSEx scoping with BSA data for refinement in NSEv is optimal.
Algorithm enhanced with NSEx and NSEv: qualitative results compared to SC+NSEx and SC only. AUROC and AUPR values obtained after 10 runs with each algorithm are shown, together with standard deviations for subsets of 9 runs in parentheses (see Section 2.3). Variants are SC (time series only, without integration of additional data), SC+NSEx.ALL (using all data for NSEx), SC+NSEx.ALL+NSEv.ALL (using all data for both NSEx and NSEv) and SC+NSEx.ALL+NSEv.BSA (using all data for NSEx, but BSA only for NSEv). Integrating all data at the evaluation stage decreases the quality of interactions compared to those obtained with NSEx. Use of BSA alone for evaluation yields better results.
| Algorithm | SC | SC+NSEx.ALL | SC+NSEx.ALL+NSEv.ALL | SC+NSEx.ALL+NSEv.BSA |
|---|---|---|---|---|
| AUROC | 0.603 (0.017) | 0.744 (0.018) | 0.700 (0.027) | |
| AUPR | 0.037 (0.002) | 0.066 (0.006) | 0.049 (0.003) |
Figure 3Combining the two time course datasets, SC and DC. AUROC and AUPR values (and standard deviations for subsets of models) for gene connections obtained through integration scheme NSEx.ALL+NSEv.BSA are displayed (see Section 2.3). Shown are the SC dataset alone, SC integration (SC+NSEx.ALL+NSEv.BSA) and SC+DC integration (SC+DC+NSEx.ALL+NSEv.BSA). Overall improvement is ∼20% with the combined data and integration scheme specified.
Combining the two time course datasets, SC and DC. AUROC and AUPR values (and standard deviations) for gene connections obtained through integration scheme NSEx.ALL+NSEv.BSA are displayed. Shown are the SC dataset alone, SC integration (SC+NSEx.ALL+NSEv.BSA) and SC+DC integration (SC+DC+NSEx.ALL+NSEv.BSA).
| Algorithm | SC | SC+NSEx.ALL+NSEv.BSA | SC+DC+NSEx.ALL+NSEv.BSA |
|---|---|---|---|
| AUROC | 0.603 (0.017) | 0.764 (0.023) | |
| AUPR | 0.037(0.002) | 0.086 (0.003) |