| Literature DB >> 24564249 |
Marco Masseroli, Barend Mons, Erik Bongcam-Rudloff, Stefano Ceri, Alexander Kel, François Rechenmann, Frederique Lisacek, Paolo Romano.
Abstract
Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context. First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered. In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.Entities:
Mesh:
Year: 2014 PMID: 24564249 PMCID: PMC4015876 DOI: 10.1186/1471-2105-15-S1-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Intended interactions between EDAM ontology and BioXSD schema. The semantics layer supports searching by end users, as well as automated reasoning. Both these tasks leverage shared ontologies. The syntax layer supports actual interoperability between tools, as well as programmatic access; both tasks leveraging common exchange formats and schema. The two layers are made consistent by a proper ontology based annotation of data and services.
Main Biological and Medical Science topics with high potential and merit for future Horizon 2020 actions.
| High potential Biological and Medical Science topics |
|---|
| Integrated disease and phenotype ontologies and supporting tools |
| Molecular profile reference databases for cells and tissues |
| European infrastructure for genome research |
| European animal genomics and phenomics infrastructure |
| An integrating activity for fish genome resources |
| Trans-national infrastructure for plant genomic science |
| European proteomics research infrastructure |
| Integration of national non mammalian model animal facilities on the European level |
| European primate network: maintaining and developing best practice, staff education and international standards in biological and biomedical research |
| Cyber-infrastructure for farmed and companion livestock |
| An integrated technology platform for high-throughput, multi-level phenotyping research to design robust farm animals for tomorrow |
| Network of animal biological resources centers |
| Aquaculture infrastructures for excellence in EU fish research |
| European network of high containment animal facilities to improve control of livestock transboundary and zoonotic infectious diseases |
| European seed bank research infrastructure |
| Forest tree genetic resources, a pan-European patrimony to be maintained and developed at the benefit of the scientific community |
| Improved access of the scientific community to collections of non pathogenic, pathogenic, emerging and clinical human/animal virus isolates (including fish and arthropods) up to biohazard risk group 4 |
| Facilities, resources and services for mining the nature and relevance of biocide resistance |
| Pan-European resource for gene transfer vectors towards clinical application |
Figure 2The basic workflow of data driven science. The general principle that a data exchange platform should enable and support is depicted. A newly generated data set is combined with other data sets (ideally all core legacy information of relevance) and new insights, including complicated processes, such as multi-omics data integration, multi-scale modeling, computer reasoning and inference, etc., are derived from that data integration and modelling. To this end, users should be allowed to upload their (novel) data and run standard workflows of choice on the combined data.
Figure 3Integration of scientific data. Relevant scientific data, that constitute a central core of biological information requested by almost all domains, could be made available in an interoperable format to make their direct integration, comparison and modeling with domain specific data possible.
Global bioinformatics market by submarket.
| Segment | 2007 | 2008 | 2009 | 2014 |
|---|---|---|---|---|
| Tools | 659.10 | 850.30 | 1,099.20 | 4,071.90 |
| Content/database | 948.40 | 1,133.70 | 1,358.50 | 3.439.20 |
| Services | 222.20 | 276.50 | 345.10 | 1,093.00 |
2007-2014 values in million dollars (from: Business Insights, Ltd. report).
User classes of open source bioinformatics platforms and main reasons why they will be interested in the platforms.
| Users | Reasons to be interested in open source bioinformatics platforms |
|---|---|
| Convenient tools and utilities for creating new modules | |
| Ready to use libraries of classes for working with main bioinformatics and system biology objects (e.g. sequences, genes, networks, etc.) | |
| Ready integration with all main databases that are needed for working with new modules | |
| Ability to upload personal modules to the platform and set the policy of their licensing (free, or commercial through an application store) | |
| Convenient unified environment that combines a variety of programs and algorithms in different ways, which may become necessary for the analysis of different data from laboratory tests | |
| Unified interface for all modules of the platform that eases the training process of the end users | |
| Convenient system that can use several programming languages and statistical packages for the creation of scripts, which bioinformaticians can prepare for their further usage in processing of large amounts of routine data | |
| Convenient system for construction of work procedures for automatic execution of a given sequence of programs; after their creation, the obtained procedures are passed to end users for automated processing of new data | |
| Availability of a large number of ready-to-use modules on different branches of bioinformatics, system biology and computer aided drug modelling | |
| User-friendly interfaces | |
| Ability of creation of personalized structured data repository "in the cloud", with data of different origins (e.g. transcriptomics, proteomics, etc.) | |
| Ability to provide reproducible research | |
| Ready-to-use operating procedures for automatic execution of given sequences of programs that can answer dedicated biological questions | |