| Literature DB >> 25973165 |
Kiyoko F Aoki-Kinoshita1, Akira R Kinjo2, Mizuki Morita3, Yoshinobu Igarashi4, Yi-An Chen4, Yasumasa Shigemoto5, Takatomo Fujisawa5, Yukie Akune1, Takeo Katoda1, Anna Kokubu1, Takaaki Mori1, Mitsuteru Nakao6, Shuichi Kawashima7, Shinobu Okamoto7, Toshiaki Katayama7, Soichi Ogishima8.
Abstract
BACKGROUND: Linked Data has gained some attention recently in the life sciences as an effective way to provide and share data. As a part of the Semantic Web, data are linked so that a person or machine can explore the web of data. Resource Description Framework (RDF) is the standard means of implementing Linked Data. In the process of generating RDF data, not only are data simply linked to one another, the links themselves are characterized by ontologies, thereby allowing the types of links to be distinguished. Although there is a high labor cost to define an ontology for data providers, the merit lies in the higher level of interoperability with data analysis and visualization software. This increase in interoperability facilitates the multi-faceted retrieval of data, and the appropriate data can be quickly extracted and visualized. Such retrieval is usually performed using the SPARQL (SPARQL Protocol and RDF Query Language) query language, which is used to query RDF data stores. For the database provider, such interoperability will surely lead to an increase in the number of users.Entities:
Keywords: Alzheimer’s disease; DDBJ; Data integration; Faceted search interface; Glycobiology; PDBj; Semantic Web
Year: 2015 PMID: 25973165 PMCID: PMC4429360 DOI: 10.1186/2041-1480-6-3
Source DB: PubMed Journal: J Biomed Semantics
Generated RDF of Alzheimer’s disease data
| Data type | URL of RDF (in turtle format) file | Number of triples |
|---|---|---|
| Gene variation data (AlzGene) |
| 66856 |
| Gene expression data (fold changes and P values) |
| 289686 |
| Gene annotation data |
| 965966 |
| PubMed co-occurrence data |
| 66856 |
| Total | 1389364 |
Figure 1(a) RDF diagram for gene variation data (AlzGene). (b) RDF diagram for gene expression data (fold changes and P values). (c) RDF diagram for gene annotation data. (d) PubMed co-occurrence data.
Figure 2Snapshot of the facet viewer app for the gene data of Alzheimer’s disease.
SPARQL results of differentially expressed genes in Alzheimer’s Disease data
| Probe set ID | Ratio | P value | Gene symbol | SwissProt URI |
|---|---|---|---|---|
|
| 2.23 | 4.3 × 10−3 | F5 |
|
|
| 2.01 | 3.7 × 10−3 | F9 |
|
|
| 1.55 | 2.0 × 10−2 | PLG |
|
|
| 0.60 | 4.2 × 10−2 | VWF |
|
PDB traversing results from the four genes obtained in (1)
| UniProt ID | Gene (protein) name | PDB entries with small compounds * |
|---|---|---|
| B4DU26 | cDNA FLJ50218, highly similar to Coagulation factor V | - |
| P00740 | Coagulation factor IX (EC 3.4.21.22) | 1RFN (PBZ), 3LC3 (IYX), 3LC5 (IZX) |
| A6PVI2 | Plasminogen | - |
| A8K7V7 | cDNA FLJ75522, highly similar to Homo sapiens von Willebrand factor (VWF), mRNA | - |
*PDB entries are in 4-letter PDB ID, and small compounds are in 3-letter HETATM ID (in parentheses).
Figure 3An example of a glycan entry page in RINGS, describing the details of a particular glycan structure. Originally sent in HTML format, such information can be RDF-ized by sending the corresponding data in RDF-XML. This can be done by transforming the data such that they are organized as triples (eg. turtle or n-triples format) which can be directly converted to RDF-XML.
Figure 4Linked Data used in the use case. Starting from a PDB entry PDBr:1ATP, links are followed through UniProt, PROSITE (Bio2RDF portal) to obtain other PDB entries sharing the same PROSITE motifs. Prefixes are defined as follow: PDBr: = http://pdbj.org/rdf/, UP: = http://purl.uniprot.org/uniprot/, PDBo: = http://rdf.wwpdb.org/schema/pdbx-v40.owl#, UPc: = http://purl.uniprot.org/core/, UPd: = http://purl.uniprot.org/database/PROSITE, PS: = http://purl.uniprot.org/prosite/ (this is transferred to http://bio2rdf.org/prosite:), B2: = http://bio2rdf.org/bio2rdf_resource:, rdfs: = http://www.w3.org/2000/01/rdf-schema#. The diagram was created using Cytoscape (Smoot, 2011).