Literature DB >> 27187206

SoFIA: a data integration framework for annotating high-throughput datasets.

Liam Harold Childs1, Soulafa Mamlouk2, Jörgen Brandt1, Christine Sers2, Ulf Leser1.   

Abstract

MOTIVATION: Integrating heterogeneous datasets from several sources is a common bioinformatics task that often requires implementing a complex workflow intermixing database access, data filtering, format conversions, identifier mapping, among further diverse operations. Data integration is especially important when annotating next generation sequencing data, where a multitude of diverse tools and heterogeneous databases can be used to provide a large variety of annotation for genomic locations, such a single nucleotide variants or genes. Each tool and data source is potentially useful for a given project and often more than one are used in parallel for the same purpose. However, software that always produces all available data is difficult to maintain and quickly leads to an excess of data, creating an information overload rather than the desired goal-oriented and integrated result.
RESULTS: We present SoFIA, a framework for workflow-driven data integration with a focus on genomic annotation. SoFIA conceptualizes workflow templates as comprehensive workflows that cover as many data integration operations as possible in a given domain. However, these templates are not intended to be executed as a whole; instead, when given an integration task consisting of a set of input data and a set of desired output data, SoFIA derives a minimal workflow that completes the task. These workflows are typically fast and create exactly the information a user wants without requiring them to do any implementation work. Using a comprehensive genome annotation template, we highlight the flexibility, extensibility and power of the framework using real-life case studies.
AVAILABILITY AND IMPLEMENTATION: https://github.com/childsish/sofia/releases/latest under the GNU General Public License CONTACT: liam.childs@hu-berlin.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Year:  2016        PMID: 27187206     DOI: 10.1093/bioinformatics/btw302

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  4 in total

1.  DNA copy number changes define spatial patterns of heterogeneity in colorectal cancer.

Authors:  Soulafa Mamlouk; Liam Harold Childs; Daniela Aust; Daniel Heim; Friederike Melching; Cristiano Oliveira; Thomas Wolf; Pawel Durek; Dirk Schumacher; Hendrik Bläker; Moritz von Winterfeld; Bastian Gastl; Kerstin Möhr; Andrea Menne; Silke Zeugner; Torben Redmer; Dido Lenze; Sascha Tierling; Markus Möbs; Wilko Weichert; Gunnar Folprecht; Eric Blanc; Dieter Beule; Reinhold Schäfer; Markus Morkel; Frederick Klauschen; Ulf Leser; Christine Sers
Journal:  Nat Commun       Date:  2017-01-25       Impact factor: 14.919

2.  Variant information systems for precision oncology.

Authors:  Johannes Starlinger; Steffen Pallarz; Jurica Ševa; Damian Rieke; Christine Sers; Ulrich Keilholz; Ulf Leser
Journal:  BMC Med Inform Decis Mak       Date:  2018-11-21       Impact factor: 2.796

3.  Serial Analysis of Gene Mutations and Gene Expression during First-Line Chemotherapy against Metastatic Colorectal Cancer: Identification of Potentially Actionable Targets within the Multicenter Prospective Biomarker Study REVEAL.

Authors:  Jörg Kumbrink; Lisa Bohlmann; Soulafa Mamlouk; Torben Redmer; Daniela Peilstöcker; Pan Li; Sylvie Lorenzen; Hana Algül; Stefan Kasper; Dirk Hempel; Florian Kaiser; Marlies Michl; Harald Bartsch; Jens Neumann; Frederick Klauschen; Michael von Bergwelt-Baildon; Dominik Paul Modest; Arndt Stahler; Sebastian Stintzing; Andreas Jung; Thomas Kirchner; Reinhold Schäfer; Volker Heinemann; Julian W Holch
Journal:  Cancers (Basel)       Date:  2022-07-26       Impact factor: 6.575

4.  Malignant transformation and genetic alterations are uncoupled in early colorectal cancer progression.

Authors:  Soulafa Mamlouk; Tincy Simon; Laura Tomás; David C Wedge; Alexander Arnold; Andrea Menne; David Horst; David Capper; Markus Morkel; David Posada; Christine Sers; Hendrik Bläker
Journal:  BMC Biol       Date:  2020-09-07       Impact factor: 7.431

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.