Literature DB >> 30576493

Foundry: a message-oriented, horizontally scalable ETL system for scientific data integration and enhancement.

Ibrahim Burak Ozyurt1, Jeffrey S Grethe1.   

Abstract

Data generated by scientific research enables further advancement in science through reanalyses and pooling of data for novel analyses. With the increasing amounts of scientific data generated by biomedical research providing researchers with more data than they have ever had access to, finding the data matching the researchers' requirements continues to be a major challenge and will only grow more challenging as more data is produced and shared. In this paper, we introduce a horizontally scalable distributed extract-transform-load system to tackle scientific data aggregation, transformation and enhancement for scientific data discovery and retrieval. We also introduce a data transformation language for biomedical curators allowing for the transformation and combination of data/metadata from heterogeneous data sources. Applicability of the system for scientific data is illustrated in biomedical and earth science domains.

Entities:  

Mesh:

Year:  2018        PMID: 30576493      PMCID: PMC6301337          DOI: 10.1093/database/bay130

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


  14 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Gemma: a resource for the reuse, sharing and meta-analysis of expression profiling data.

Authors:  Anton Zoubarev; Kelsey M Hamer; Kiran D Keshav; E Luke McCarthy; Joseph Roy C Santos; Thea Van Rossum; Cameron McDonald; Adam Hall; Xiang Wan; Raymond Lim; Jesse Gillis; Paul Pavlidis
Journal:  Bioinformatics       Date:  2012-07-10       Impact factor: 6.937

3.  The NITRC image repository.

Authors:  David N Kennedy; Christian Haselgrove; Jon Riehl; Nina Preuss; Robert Buccigrossi
Journal:  Neuroimage       Date:  2015-06-02       Impact factor: 6.556

Review 4.  A survey of the neuroscience resource landscape: perspectives from the neuroscience information framework.

Authors:  Jonathan Cachat; Anita Bandrowski; Jeffery S Grethe; Amarnath Gupta; Vadim Astakhov; Fahim Imam; Stephen D Larson; Maryann E Martone
Journal:  Int Rev Neurobiol       Date:  2012       Impact factor: 3.230

5.  Finding useful data across multiple biomedical data repositories using DataMed.

Authors:  Lucila Ohno-Machado; Susanna-Assunta Sansone; George Alter; Ian Fore; Jeffrey Grethe; Hua Xu; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; Anupama E Gururaj; Elizabeth Bell; Ergin Soysal; Nansu Zong; Hyeon-Eui Kim
Journal:  Nat Genet       Date:  2017-05-26       Impact factor: 38.330

6.  NCBI GEO: mining tens of millions of expression profiles--database and tools update.

Authors:  Tanya Barrett; Dennis B Troup; Stephen E Wilhite; Pierre Ledoux; Dmitry Rudnev; Carlos Evangelista; Irene F Kim; Alexandra Soboleva; Maxim Tomashevsky; Ron Edgar
Journal:  Nucleic Acids Res       Date:  2006-11-11       Impact factor: 16.971

7.  DATS, the data tag suite to enable discoverability of datasets.

Authors:  Susanna-Assunta Sansone; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; George Alter; Jeffrey S Grethe; Hua Xu; Ian M Fore; Jared Lyle; Anupama E Gururaj; Xiaoling Chen; Hyeon-Eui Kim; Nansu Zong; Yueling Li; Ruiling Liu; I Burak Ozyurt; Lucila Ohno-Machado
Journal:  Sci Data       Date:  2017-06-06       Impact factor: 6.444

8.  The Universal Protein Resource (UniProt) 2009.

Authors: 
Journal:  Nucleic Acids Res       Date:  2008-10-04       Impact factor: 16.971

9.  Extending the NIF DISCO framework to automate complex workflow: coordinating the harvest and integration of data from diverse neuroscience information resources.

Authors:  Luis N Marenco; Rixin Wang; Anita E Bandrowski; Jeffrey S Grethe; Gordon M Shepherd; Perry L Miller
Journal:  Front Neuroinform       Date:  2014-05-28       Impact factor: 4.081

10.  The FAIR Guiding Principles for scientific data management and stewardship.

Authors:  Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal:  Sci Data       Date:  2016-03-15       Impact factor: 6.444

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.