| Literature DB >> 18687127 |
Peter Li1, Juan I Castrillo, Giles Velarde, Ingo Wassink, Stian Soiland-Reyes, Stuart Owen, David Withers, Tom Oinn, Matthew R Pocock, Carole A Goble, Stephen G Oliver, Douglas B Kell.
Abstract
BACKGROUND: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools.Entities:
Mesh:
Year: 2008 PMID: 18687127 PMCID: PMC2528018 DOI: 10.1186/1471-2105-9-334
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A diagram of the microarray data analysis workflow. Data is retrieved from the maxdLoad2 database via the maxdBrowse web service based on user selection criteria implemented using Beanshell scripts. These data are merged and then analysed by R to identify differentially-expressed genes. These genes are then annotated with common terms from the Gene Ontology using the GOTermFinder tool.
Figure 2A screenshot of the Taverna service palette. The palette shows the RShell processor and the service operations available from the maxdBrowse web service interface to the maxdLoad2 database.
Figure 3A schematic diagram showing the relationship between Taverna, the RShell processor, RServe and the R tool.
Figure 4Relationship between input and output ports with variables in R scripts. All data items required for analysis and parameterisation are passed into the R script by declaring them as variables named after the input ports. Three input variables have been defined for the RShell processor in example workflow, the 2 sets of microarray data, 'control_csv' and 'test_csv', undergoing t-test analysis and 'pvalue' which defines the threshold used for the t-tests. The results of the R analysis can be passed out into the remainder of the workflow by declaring them as variables named after the output ports. The example RShell processor in the workflow contains only one output, a CSV file containing the differentially-expressed genes identified by the t-tests.
Figure 5A screenshot showing the PDF report generated by the GoTermFinder web service during the execution of the microarray data analysis workflow. The PDF report is displayed using the PDFRenderer plugin for Taverna by right-clicking on the PDF file object and selecting View as PDF.