| Literature DB >> 23035929 |
Pietro Hiram Guzzi1, Giuseppe Agapito, Maria Teresa Di Martino, Mariamena Arbitrio, Pierfrancesco Tassone, Pierosandro Tagliaferri, Mario Cannataro.
Abstract
BACKGROUND: Clinical Bioinformatics is currently growing and is based on the integration of clinical and omics data aiming at the development of personalized medicine. Thus the introduction of novel technologies able to investigate the relationship among clinical states and biological machineries may help the development of this field. For instance the Affymetrix DMET platform (drug metabolism enzymes and transporters) is able to study the relationship among the variation of the genome of patients and drug metabolism, detecting SNPs (Single Nucleotide Polymorphism) on genes related to drug metabolism. This may allow for instance to find genetic variants in patients which present different drug responses, in pharmacogenomics and clinical studies. Despite this, there is currently a lack in the development of open-source algorithms and tools for the analysis of DMET data. Existing software tools for DMET data generally allow only the preprocessing of binary data (e.g. the DMET-Console provided by Affymetrix) and simple data analysis operations, but do not allow to test the association of the presence of SNPs with the response to drugs.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23035929 PMCID: PMC3496574 DOI: 10.1186/1471-2105-13-258
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Workflow of a clinical bioinformatics experiment from the sample collection to the data analysis. Workflow of data in a typical experiment
DMET data format
| C/C | C/C | T/T | T/T | |
| G/C | C/C | -/T | A/A | |
| C/T | C/T | C/T | C/T | |
| G/G | A/G | G/G | G/G |
In this table is represented a possible Example of input table, produced at the end of DMET-Console workflow. Where each row represents a probe identified by its own identifier, and each column represents a subject represented by its own identifier. File must contain in the first row a list of specific identifiers: the probe_set identifiers in the first column and the identifiers of subjects, in the subsequent ones. The cell ( i,j) contains the allele belonging at i-th subject into the j-th probe_set, identified in the previous analysis.
Figure 2Workflow of an experiment of analysis through the software. Figure shows the workflow of execution of a typical analysis. Initially user loads data into the software as depicted in the upper left corner of Figure 2(a). Then user has to attribute the right class to each sample (Figure 2b) and to choose the analysis method Figure 2(c). The software calculates the allele frequencies for each allele and for each probe. At this point DMET Analyzer calculates the Fisher’s-tests and finally it shows the results in a new window in which probes may be sorted alphabetically or by p-value as depicted in Figure 2(d). User can select a SNP in this table and may visualize annotation data by just clicking on the SNP identifier as depicted in Figure 2(e). Analogously, user may visualize the distribution of variants using the embedded visualizer as evidenced in Figure 2(f)
Figure 3Memory Occupancy and Execution Times. Figure shows the execution time and the total amount of requested memory for a growing dimension of dataset. We performed these measures for different datasets considering ten datasets from 100 to 1000 patients increased by 100. Results show that the implementation of DMET Analyzer and the algorithmic choice enable the processing of this dataset requesting approximately the same time and the same memory for the execution (except for the initial loading of files)
Figure 4Comparison with existing Tools. Comparison of DMET Analyzer with respect to existing software tools considering a typical workflow of analysis. Data produced by the DMET platform may be preprocessed using apt-dmet-genotype. Then this data may be given as input to DMET-Console to be transformed into a format readable by other softwares. Diversely DMET console may perform these two steps. Then this data may be processed by statistical tools after some manual steps. Conversely our software is able to perform automatically all final steps