| Literature DB >> 24052712 |
Abstract
The number of microarray and other high-throughput experiments on primary repositories keeps increasing as do the size and complexity of the results in response to biomedical investigations. Initiatives have been started on standardization of content, object model, exchange format and ontology. However, there are backlogs and inability to exchange data between microarray repositories, which indicate that there is a great need for a standard format and data management. We have introduced a metadata framework that includes a metadata card and semantic nets that make experimental results visible, understandable and usable. These are encoded in syntax encoding schemes and represented in RDF (Resource Description Frame-word), can be integrated with other metadata cards and semantic nets, and can be exchanged, shared and queried. We demonstrated the performance and potential benefits through a case study on a selected microarray repository. We concluded that the backlogs can be reduced and that exchange of information and asking of knowledge discovery questions can become possible with the use of this metadata framework.Entities:
Keywords: Knowledge discovery; Metadata card; Metadata registry; Microarray; Semantic net
Year: 2011 PMID: 24052712 PMCID: PMC3776701 DOI: 10.2478/v10034-011-0047-7
Source DB: PubMed Journal: Balkan J Med Genet ISSN: 1311-0160 Impact factor: 0.519
MAdmf (microarray discovery metadata card framework).
| Component | What It Does |
|---|---|
| MAdmc (microarray discovery metadata card) | Supports the MINiML file |
| Semantic layer (semantic nets) | Details domain-specific topics, fortifies the intended meaning; discloses otherwise hidden data |
| Query layer (optional) | SPARQL queries |
| MAdmr (microarray discovery metadata registry) | Main files for MAdmf |
The content of MAdmf is as follows: MAdmc.xml: Microarray discovery metadata card; MAdmc.xsd: schema file for Madmc; Experimenter.rdf: SemNet (FOAF/RDF file) for experimenters; Result.rdf: SemNet (RuleML Datalog/RDF file) for result/summary section; MAdmc.rq: Query file in SPARQL to run on SemNets.
MAdmc elements (2a) and obligation categories (2b) for elements.
| Elements | Attributes (ISO 11179) | |
|---|---|---|
| Policy | ||
| Title | Definition | |
| Version | ||
| Subject | ||
| Mandatory | An element must be supplied with a value to comply with MAdmc | |
| Conditional | The usage of an element is dependent upon a particular condition | |
| Optional | An element may be supplied with a value but it is not a requirement | |
Figure 1MAdmc.xsd (schema file for microarray discovery metadata card).
Figure 2The MAdmr content.
Figure 3MAdmc program. An application that reads the MINiML file, accepts values for additional fields and creates the metadata card (MAdmc.xml).
Statements from GEO records encoded in the RuleML Datalog.
| Free text | |
|---|---|
| <Atom> | Encoded text (a.1) |
| <rulebase> | Encoded text (a.2) |
This is a Result SemNet of GEO Series record, GSE12848 (P53 gene related breast cancer record)
| <?xml version=“1.0”?> MicroRNAs silence anti-proliferative genes. MicroRNAs are novel key players in the mammalian cellular proliferation network. Expression of microRNAs is down-regulated in senescent cells and in breast cancers harboring wild-type p53. MicroRNAs are repressed by p53 in an E2F1-mediated manner. MicroRNAs silence anti-proliferative genes, which themselves are E2F1 targets. MicroRNAs and transcriptional regulators appear to cooperate in the framework of a multi-gene transcriptional and post-transcriptional feed-forward loop. |
Figure 4The graph output for the SemNet in Table 4 as validated by the RDF Validator.
Figure 5A sample SPARQL query on Result SemNet (online “SPARQLer RDF Query Tool” used at http://www.sparrl.org/query.html)
Data composition as of May 6, 2011.
| GEO Repository | Public | Unreleased | Total | Backlog |
|---|---|---|---|---|
| Platforms (GPL) | 8,713 | 494 | 9,207 | ~6.0% |
| Samples (GSM) | 557,206 | 121,682 | 678,888 | ~18.0% |
| Series (GSE) | 22,677 | 4,224 | 26,901 | ~16.0% |
| Datasets (GDS) | 2,721 | – | Number of experiments (Series records/2) | ~80.0% |