| Literature DB >> 26464783 |
Eleni Mina1, Mark Thompson1, Rajaram Kaliyaperumal1, Jun Zhao2, van Eelke der Horst1, Zuotian Tatum1, Kristina M Hettne1, Erik A Schultes1, Barend Mons1, Marco Roos1.
Abstract
Data from high throughput experiments often produce far more results than can ever appear in the main text or tables of a single research article. In these cases, the majority of new associations are often archived either as supplemental information in an arbitrary format or in publisher-independent databases that can be difficult to find. These data are not only lost from scientific discourse, but are also elusive to automated search, retrieval and processing. Here, we use the nanopublication model to make scientific assertions that were concluded from a workflow analysis of Huntington's Disease data machine-readable, interoperable, and citable. We followed the nanopublication guidelines to semantically model our assertions as well as their provenance metadata and authorship. We demonstrate interoperability by linking nanopublication provenance to the Research Object model. These results indicate that nanopublications can provide an incentive for researchers to expose data that is interoperable and machine-readable for future use and preservation for which they can get credits for their effort. Nanopublications can have a leading role into hypotheses generation offering opportunities to produce large-scale data integration.Entities:
Keywords: Data integration; Huntington’s disease; Interoperability; Nanopublication; Provenance; Research object; Workflows
Year: 2015 PMID: 26464783 PMCID: PMC4603842 DOI: 10.1186/2041-1480-6-5
Source DB: PubMed Journal: J Biomed Semantics
Figure 1The template for the differential gene expression assertion. Orange diamonds refer to a RDF resource that was defined by this nanopublication, whereas the gene (pink diamond) is defined by a bio2rdf resource. The Sequence Ontology (SO) were used for the predicates refers_to. The classes for Huntington’s disease, gene and gene-disease association linked with altered gene expression are defined by the nifstd, bio2rdf and SIO ontologies respectively
Figure 2Orange diamonds refer also here to a RDF resource that was defined by this nanopublication, whereas the gene (pink diamond) is defined by a bio2rdf resource. The classes promoter and biological region were defined by the nifstd ontology. The Semanticscience Integrated Ontology (SIO) and Sequence Ontology (SO) were used for the predicates overlaps_with and associated_with
Figure 3Schematic representation of the extension of SO with our own defined classes. In yellow are depicted the SO ontology classes and in blue the classes we defined in our case study
Definition of new classes
| Class | URI | Definition | subclass of | subclass of URI |
|---|---|---|---|---|
| chromatin_region | biosemantics:chromatin_region | A region of chromatin, likely to | biological_region | so:SO_0001411 |
| be involved in a biological process | ||||
| chromatin_state | biosemantics:chromantin_state | Annotation of chromatin states, defined by | feature_attribute | so:SO_0000733 |
| combinations of chromatin modification patterns | ||||
| (described in publication by Ernst et al. Nature, 2011) | ||||
| active_promoter | biosemantics:active_promoter | Open chromatin region, associated with promoters, | chromatin_state | biosemantics:chromatin_state |
| transcriptionally active, defined by the most highly | ||||
| observed chromatin marks : H3K4me2,H3K4me3, H3K27ac, | ||||
| H3K9ac | ||||
| weak_promoter | biosemantics:weak_promoter | Open chromatin region, associated with promoters, | chromatin_state | biosemantics:chromatin_state |
| weak transcription activity, defined by the most highly | ||||
| observed chromatin marks : H3K4me1, H3K4me2,H3K4me3, | ||||
| H3K9ac | ||||
| poised_promoter | biosemantics:poised_promoter | Open chromatin region, associated with promoters, | chromatin_state | biosemantics:chromatin_state |
| described as a bivalent domain that has strong signals | ||||
| of both active and inactive histone marks. Most highly | ||||
| observed histone marks: H3K27me3, H3K4me2, H3K4me3 | ||||
| heterochromatic | biosemantics:heterochromatic | Closed chromatin formation, transcriptionally inactive. | chromatin_state | biosemantics:chromatin_state |
| It is associated with none histone marks |
Prefix biosemantics: http://rdf.biosemantics.org/ontologies/chromatin#.
Prefix so: http://purl.obolibrary.org/obo/.
Figure 4The top part of this figure (above the triple line) shows the Nanopublication consisting of its three constitutive parts: Assertion, Provenance and Publication Info. Below the triple line is the Workflow pack which contains the Research Object (RO) as well as all the input and output data, workflows and results for the experiment. Note that the nanopublication and the workflow pack are separate entities that can exist in different locations. The two models are linked by the predicates shown in the figure as arrows crossing the triple line. In this way the nanopublication re-uses and exposes the detailed (and partially automatically generated) provenance from the RO. Note that multiple nanopublications can reference the same RO. Not shown in the figure is the possibility for the RO to link back to the nanopublication as one of the results of the experiment described by the RO
Figure 5First we queried the differentially expressed genes nanopublication to get all gene ids. Further we filtered these ids by querying the biological region nanopublication. Subsequently we queried the bio2RDF GO annotation dataset to select genes that are involved in the biological processes mentioned in the text. For this gene list we queried bio2RDF gene information data source to retrieve gene symbols that belong to the human taxonomy, that were used to retrieve the drug target from bio2RDF drugbank. Last, these drug targets are used to find drugs from the bio2RDF drugbank source. B. The process of querying GO for our set of biological processes(including the children concepts) and the mapping URI to filter genes from the bio2RDF GO annotation dataset. In this example we retrieve drug targets for one of the GO terms, autophagy.
Drug target results from the data integration query
| Gene | GeneSymbol | GoTerm | Target | Drug | DrugDescription [3] |
|---|---|---|---|---|---|
|
| "ABL1" | "autophagy"@en |
|
| Adenosine triphosphate (ATP) |
|
| Imatinib | ||||
|
| Dasatinib | ||||
|
| Nilotinib | ||||
|
| "FKBP1A" | "protein folding"@en |
|
| Pimecrolimus |
|
| Tacrolimus | ||||
|
| Sirolimus | ||||
|
| "PPIF" | "protein folding"@en |
|
| L-Proline |
|
| "PPIA" |
| |||
|
| "PPIB" |
| |||
|
| "PPIC" |
| |||
|
| "TUBA4A" | "protein folding"@en |
|
| Vincristine |
|
| cabazitaxel | ||||
|
| Podofilox | ||||
|
| "PSMD1" | proteasomal protein catabolic process |
|
| Bortezomib |
[3]Although not drugs in the strict sense, these ligands may serve as starting point for (rational) drug design efforts for these targets.
Figure 6SPARQL query for retrieving and adding GO terms from the GO ontology to our local graph. B. The example SPARQL query that retrieves drug targets and drug names from Drugbank that are associated with the genes that we identified as differentially expressed in Huntington’s Disease and overlap with CpG islands. red: our local nanopublication store; blue: the mapping URI; green: bio2RDF data sources.