Literature DB >> 36251274

CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis.

Pavel S Novichkov1, John-Marc Chandonia1, Adam P Arkin1,2.   

Abstract

BACKGROUND: Many organizations face challenges in managing and analyzing data, especially when relevant datasets arise from multiple sources and methods. Analyzing heterogeneous datasets and additional derived data requires rigorous tracking of their interrelationships and provenance. This task has long been a Grand Challenge of data science and has more recently been formalized in the FAIR principles: that all data objects be Findable, Accessible, Interoperable, and Reusable, both for machines and for people. Adherence to these principles is necessary for proper stewardship of information, for testing regulatory compliance, for measuring the efficiency of processes, and for facilitating reuse of data-analytical frameworks.
FINDINGS: We present the Contextual Ontology-based Repository Analysis Library (CORAL), a platform that greatly facilitates adherence to all 4 of the FAIR principles, including the especially difficult challenge of making heterogeneous datasets Interoperable and Reusable across all parts of a large, long-lasting organization. To achieve this, CORAL's data model requires that data generators extensively document the context for all data, and our tools maintain that context throughout the entire analysis pipeline. CORAL also features a web interface for data generators to upload and explore data, as well as a Jupyter notebook interface for data analysts, both backed by a common API.
CONCLUSIONS: CORAL enables organizations to build FAIR data types on the fly as they are needed, avoiding the expense of bespoke data modeling. CORAL provides a uniquely powerful platform to enable integrative cross-dataset analyses, generating deeper insights than are possible using traditional analysis tools.
© The Author(s) 2022. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  FAIR data; Jupyter; contexton; data analysis; data management; microtype; provenance

Mesh:

Year:  2022        PMID: 36251274      PMCID: PMC9575582          DOI: 10.1093/gigascience/giac089

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   7.658


  12 in total

1.  National Institute of Neurological Disorders and Stroke Common Data Element Project - approach and methods.

Authors:  Stacie T Grinnon; Kristy Miller; John R Marler; Yun Lu; Alexandra Stout; Joanne Odenkirchen; Selma Kunitz
Journal:  Clin Trials       Date:  2012-02-27       Impact factor: 2.486

2.  1,500 scientists lift the lid on reproducibility.

Authors:  Monya Baker
Journal:  Nature       Date:  2016-05-26       Impact factor: 49.962

3.  CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis.

Authors:  Pavel S Novichkov; John-Marc Chandonia; Adam P Arkin
Journal:  Gigascience       Date:  2022-10-17       Impact factor: 7.658

4.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

5.  The Economics of Reproducibility in Preclinical Research.

Authors:  Leonard P Freedman; Iain M Cockburn; Timothy S Simcoe
Journal:  PLoS Biol       Date:  2015-06-09       Impact factor: 8.029

6.  Natural bacterial communities serve as quantitative geochemical biosensors.

Authors:  Mark B Smith; Andrea M Rocha; Chris S Smillie; Scott W Olesen; Charles Paradis; Liyou Wu; James H Campbell; Julian L Fortney; Tonia L Mehlhorn; Kenneth A Lowe; Jennifer E Earles; Jana Phillips; Steve M Techtmann; Dominique C Joyner; Dwayne A Elias; Kathryn L Bailey; Richard A Hurt; Sarah P Preheim; Matthew C Sanders; Joy Yang; Marcella A Mueller; Scott Brooks; David B Watson; Ping Zhang; Zhili He; Eric A Dubinsky; Paul D Adams; Adam P Arkin; Matthew W Fields; Jizhong Zhou; Eric J Alm; Terry C Hazen
Journal:  MBio       Date:  2015-05-12       Impact factor: 7.867

7.  ChEBI in 2016: Improved services and an expanding collection of metabolites.

Authors:  Janna Hastings; Gareth Owen; Adriano Dekker; Marcus Ennis; Namrata Kale; Venkatesh Muthukrishnan; Steve Turner; Neil Swainston; Pedro Mendes; Christoph Steinbeck
Journal:  Nucleic Acids Res       Date:  2015-10-13       Impact factor: 16.971

8.  KBase: The United States Department of Energy Systems Biology Knowledgebase.

Authors:  Adam P Arkin; Robert W Cottingham; Christopher S Henry; Nomi L Harris; Rick L Stevens; Sergei Maslov; Paramvir Dehal; Doreen Ware; Fernando Perez; Shane Canon; Michael W Sneddon; Matthew L Henderson; William J Riehl; Dan Murphy-Olson; Stephen Y Chan; Roy T Kamimura; Sunita Kumari; Meghan M Drake; Thomas S Brettin; Elizabeth M Glass; Dylan Chivian; Dan Gunter; David J Weston; Benjamin H Allen; Jason Baumohl; Aaron A Best; Ben Bowen; Steven E Brenner; Christopher C Bun; John-Marc Chandonia; Jer-Ming Chia; Ric Colasanti; Neal Conrad; James J Davis; Brian H Davison; Matthew DeJongh; Scott Devoid; Emily Dietrich; Inna Dubchak; Janaka N Edirisinghe; Gang Fang; José P Faria; Paul M Frybarger; Wolfgang Gerlach; Mark Gerstein; Annette Greiner; James Gurtowski; Holly L Haun; Fei He; Rashmi Jain; Marcin P Joachimiak; Kevin P Keegan; Shinnosuke Kondo; Vivek Kumar; Miriam L Land; Folker Meyer; Marissa Mills; Pavel S Novichkov; Taeyun Oh; Gary J Olsen; Robert Olson; Bruce Parrello; Shiran Pasternak; Erik Pearson; Sarah S Poon; Gavin A Price; Srividya Ramakrishnan; Priya Ranjan; Pamela C Ronald; Michael C Schatz; Samuel M D Seaver; Maulik Shukla; Roman A Sutormin; Mustafa H Syed; James Thomason; Nathan L Tintle; Daifeng Wang; Fangfang Xia; Hyunseung Yoo; Shinjae Yoo; Dantong Yu
Journal:  Nat Biotechnol       Date:  2018-07-06       Impact factor: 54.908

9.  The FAIR Guiding Principles for scientific data management and stewardship.

Authors:  Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal:  Sci Data       Date:  2016-03-15       Impact factor: 6.444

10.  Best practice data life cycle approaches for the life sciences.

Authors:  Philippa C Griffin; Jyoti Khadake; Kate S LeMay; Suzanna E Lewis; Sandra Orchard; Andrew Pask; Bernard Pope; Ute Roessner; Keith Russell; Torsten Seemann; Andrew Treloar; Sonika Tyagi; Jeffrey H Christiansen; Saravanan Dayalan; Simon Gladman; Sandra B Hangartner; Helen L Hayden; William W H Ho; Gabriel Keeble-Gagnère; Pasi K Korhonen; Peter Neish; Priscilla R Prestes; Mark F Richardson; Nathan S Watson-Haigh; Kelly L Wyres; Neil D Young; Maria Victoria Schneider
Journal:  F1000Res       Date:  2017-08-31
View more
  1 in total

1.  CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis.

Authors:  Pavel S Novichkov; John-Marc Chandonia; Adam P Arkin
Journal:  Gigascience       Date:  2022-10-17       Impact factor: 7.658

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.