Emily Merrill1, Stéphane Corlosquet1, Paolo Ciccarese2, Tim Clark3, Sudeshna Das2. 1. Massachusetts General Hospital, Partners Research Building, 65 Landsdowne St, Cambridge, MA, 02139, USA. 2. Massachusetts General Hospital, Partners Research Building, 65 Landsdowne St, Cambridge, MA, 02139, USA ; Harvard Medical School, 25 Shattuck St, Boston, MA, 02115, USA. 3. Massachusetts General Hospital, Partners Research Building, 65 Landsdowne St, Cambridge, MA, 02139, USA ; Harvard Medical School, 25 Shattuck St, Boston, MA, 02115, USA ; School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK.
Abstract
BACKGROUND: With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. METHODS: To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. CONCLUSIONS: Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge.
BACKGROUND: With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. METHODS: To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. CONCLUSIONS: Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge.
Authors: A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron Journal: Nat Genet Date: 2001-12 Impact factor: 38.330
Authors: Ryan R Brinkman; Mélanie Courtot; Dirk Derom; Jennifer M Fostel; Yongqun He; Phillip Lord; James Malone; Helen Parkinson; Bjoern Peters; Philippe Rocca-Serra; Alan Ruttenberg; Susanna-Assunta Sansone; Larisa N Soldatova; Christian J Stoeckert; Jessica A Turner; Jie Zheng Journal: J Biomed Semantics Date: 2010-06-22
Authors: Helen Parkinson; Ugis Sarkans; Nikolay Kolesnikov; Niran Abeygunawardena; Tony Burdett; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Emma Hastings; Ele Holloway; Natalja Kurbatova; Margus Lukk; James Malone; Roby Mani; Ekaterina Pilicheva; Gabriella Rustici; Anjan Sharma; Eleanor Williams; Tomasz Adamusiak; Marco Brandizi; Nataliya Sklyar; Alvis Brazma Journal: Nucleic Acids Res Date: 2010-11-10 Impact factor: 16.971
Authors: Lynn Marie Schriml; Cesar Arze; Suvarna Nadendla; Yu-Wei Wayne Chang; Mark Mazaitis; Victor Felix; Gang Feng; Warren Alden Kibbe Journal: Nucleic Acids Res Date: 2011-11-12 Impact factor: 16.971
Authors: Michael Waters; Stanley Stasiewicz; B Alex Merrick; Kenneth Tomer; Pierre Bushel; Richard Paules; Nancy Stegman; Gerald Nehls; Kenneth J Yost; C Harris Johnson; Scott F Gustafson; Sandhya Xirasagar; Nianqing Xiao; Cheng-Cheng Huang; Paul Boyer; Denny D Chan; Qinyan Pan; Hui Gong; John Taylor; Danielle Choi; Asif Rashid; Ayazaddin Ahmed; Reese Howle; James Selkirk; Raymond Tennant; Jennifer Fostel Journal: Nucleic Acids Res Date: 2007-10-25 Impact factor: 16.971