| Literature DB >> 22568834 |
Felix Dreher1, Thomas Kreitler, Christopher Hardt, Atanas Kamburov, Reha Yildirimman, Karl Schellander, Hans Lehrach, Bodo M H Lange, Ralf Herwig.
Abstract
BACKGROUND: Modern biomedical research is often organized in collaborations involving labs worldwide. In particular in systems biology, complex molecular systems are analyzed that require the generation and interpretation of heterogeneous data for their explanation, for example ranging from gene expression studies and mass spectrometry measurements to experimental techniques for detecting molecular interactions and functional assays. XML has become the most prominent format for representing and exchanging these data. However, besides the development of standards there is still a fundamental lack of data integration systems that are able to utilize these exchange formats, organize the data in an integrative way and link it with applications for data interpretation and analysis.Entities:
Mesh:
Year: 2012 PMID: 22568834 PMCID: PMC3424966 DOI: 10.1186/1471-2105-13-85
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1XML code for a custom data type: 'study'. This custom XML format represents data analysis results from transcriptome profiling based on DNA microarrays.
Standards initiatives and XML formats for different experimental technologies
| Microarrays | MIAME | MAGE-ML, MINiML | MGED society ( |
| Mass spectrometry | MIAPE | mzML, mzData, mzXML | HUPO PSI-MS ( |
| Molecular interactions | MIMIx | PSI-MI | HUPO PSI-MI ( |
| In situ hybridization / Immunohistochemistry | MISFISHIE | MISFISHIE.dtd | MGED society ( |
| Cellular assays | MIACA | CAOM | MIACA Standards Initiative ( |
| Quantitative PCR | MIqPCR | RDM | RDML consortium ( |
| Genomic sequences | MIGS | RDM | Genomics standards consortium ( |
| Systems biology / Pathways | MIRIAM | SBML, CellML, BioPAX | Biomodels.net ( |
The example installation of DIPSBC incorporates many of these data and allows integrated indexing of primary and secondary data.
Figure 2Graphical representation of the data processing workflow. Raw data are transformed to normalized XML files and indexed. The transformation is accomplished with Java or Perl parsers and XSLT. The integrity of XML files is ensured by XSD files. Normalized data sets are indexed with Solr/Lucene and can be queried via the web interface. 'Curl': command-line tool for the transfer of data from or to a server.
Index contents of the current DIPSBC example installation
| Protein mass spectra | PRIDE acc. 8538 | mzData | 745 | Peptide tandem mass spectra (Homo sapiens) with identifications |
| DNA microarrays | GEO acc. GSE3325 | MINiML | 19 | Prostate cancer study; chip platform: Affymetrix U133 Plus 2.0 arrays (Homo sapiens) |
| | GEO acc. GSE1133 | MINiML | 438 | Novartis gene atlas 2004 (mouse and human arrays) |
| | GEO acc. GSE10204, GSE11193 | MINiML | 80 | Genetic functional basics of water-binding- capacity in pork; chip platform: Affymetrix Porcine Whole Genome Array |
| Studies | MPI Berlin | XML 'study' | 7 | Summary tables of statistical analyses |
| Test result tables | MPI Berlin | STAT-ML | 94497 | Results of statistical analyses of microarray experiments |
| Microsatellite markers / phenotypes | University Bonn | XML 'pigs' | 873 | Pig marker and trait values |
| Molecular interactions | IntAct | PSI-MI | 5915 | Yeast-2-hybrid datasets from Rual et al. and Stelzl et al. |
| | CPDB | XML 'cpdb' | 46454 | Interactions involving genes, proteins, and compounds; source: ConsensusPathDB |
| Molecular Models | BioModels | SBML | 699 | Mathematical models of gene regulatory pathways |
| Synonyms pig | Affymetrix | XML 'synonyms' | 24123 | Pig genome annotations |
| Synonyms human | Affymetrix | XML 'synonyms' | 54675 | Homo sapiens genome annotations |
| Protein sequences | Uniprot | FASTA | 16.5 mio. | Protein sequences (FASTA format) |
| Publications | PubMed | XML 'pubmed' | 18.2 mio. | Publications in PubMed starting from 1970 |
| Foswiki pages | DIPSBC | TXT | 26 | Web pages within the DIPBSC platform |
| Total nr. of entries | 34.970.538 |
The index contains a large collection of different data types, including protein mass spectra, DNA microarray experiments, molecular interactions, protein sequences and Pubmed abstracts, amongst others. In total about 35 million records are indexed and thus searchable.
Figure 3DIPSBC search interface. The result list for the exemplary query keyword "E2F6" is shown. Different result data types are indicated by colored icons and are linked to respective helper applications.
Figure 4Screenshot of the 'mzData viewer' applet. This helper application can be used to visually examine results of peptide mass spectrometry experiments.
Figure 5Screenshot of the 'Argo Genome Browser' applet. This helper application [22] provides a graphical representation of genomic regions with the respective features and annotations.
Figure 6Screenshot of the 'Graph Browser' applet. This helper application can be used to visualize protein-protein interaction networks. Nodes represent proteins, edges represent interactions. Nodes can be expanded and collapsed, and meta-data can be accessed by clicking on individual nodes.
Figure 7Result page and network visualization for the APC gene. A) Top search results for an exemplary query for the APC gene. Different experiment types are shown, ordered by relevance. B) Visualization of the interaction network around the APC gene.