| Literature DB >> 16188029 |
Andrew R Jones1, Norman W Paton.
Abstract
BACKGROUND: Several data formats have been developed for large scale biological experiments, using a variety of methodologies. Most data formats contain a mechanism for allowing extensions to encode unanticipated data types. Extensions to data formats are important because the experimental methodologies tend to be fairly diverse and rapidly evolving, which hinders the creation of formats that will be stable over time.Entities:
Mesh:
Year: 2005 PMID: 16188029 PMCID: PMC1262694 DOI: 10.1186/1471-2105-6-235
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The BioMaterial package in MAGE-OM. There are three subclasses BioSample, LabeledExtract and BioSource of the superclass BioMaterial.
The support for different tasks offered by different modelling structures: NVT (Name-Value-Type), ontologies, external files and extend model inheritance (EMI).
| NVT | Different sources will differ in attribute and value, therefore good for local data because NVT can be used to encode arbitrary properties as long as local users are aware of the data types that can be searched. Poor for non-local searches, as inconsistent attributes and values are likely to be used. |
| Ontology | Okay if searched with exact matching terms; more difficult to support non-exact match because the search engine is unlikely to search within the ontology structure. |
| External file | Not good; there may be no access to the structure of the file. Only information retrieval style requests can be made. |
| EMI | Extensions can be searched locally but non-local searches will not be possible unless the extended models are shared. |
| NVT | Good for local sharing, poor for sharing externally because properties may be encoded in NVT in inconsistent ways. |
| Ontology | Good if terms agree (if the same ontology has been used). |
| External file | Okay if file is in a standard format, otherwise bad (information may be difficult to access). |
| EMI | Good for local sharing; cannot be shared externally unless the extended models are shared. |
| NVT | Generally good because writer can be expressive (NVT is better than plain text); only problem is misinterpretation if NVT is used inconsistently. |
| Ontology | Good because terms are well defined. |
| External file | Good if file is in a standard format, otherwise bad. Other software may be required to access the file, such as for images, archive files, spreadsheets and so on. |
| EMI | Good because writer can be as expressive as required. |
| NVT | Okay for local case (especially good if data capture is automated); in general it is a hard problem for the non-local case. |
| Ontology | Good for the non-local case. May be less good for local case if local terms are converted to ontology terms and cannot be converted back (ontology may not be able to express all local data in a lossless manner). |
| External file | Okay if file is in a standard format, otherwise bad. |
| EMI | Good for local case, poor for non-local case unless extensions are widely shared. |
| NVT | Okay, but inconsistencies could be problematic if data types are encoded differently in different settings. |
| Ontology | Good because terms are well defined and standard. |
| External file | Okay if the file is in a standard format that can be easily processed. |
| EMI | Generally good because the model developer can be expressive. |
| NVT | Good for local case; not good for the non-local case because NVT is likely to have been implemented differently. |
| Ontology | Good (consistent representation from different experiments). |
| External file | Okay if data are stored in a spreadsheet or tab-delimited text and descriptive metadata are stored correctly within the data format, or if the external file is in a standard format that can be easily processed. |
| EMI | Good for local case; cannot be done for the non-local case unless the extensions are widely shared. |
| NVT | Worse than problem for search because queries are generally more precise. |
| Ontology | Generally good, but must query more than one language and the software for query evaluation may not be able to call out to a reasoning service (to make use of the ontology structure). |
| External file | Not good (it must be assumed that there is no access to structure). |
| EMI | Good for local case, cannot be queried non-locally unless the extensions are shared. |
| NVT | Not possible; generic analyses must not depend on such data. |
| Ontology | May not be relevant; analysis is not usually over ontology terms (but much better than NVT if it is). |
| External file | Okay if data are stored in a spreadsheet or tab-delimited text and metadata are stored correctly within the data format, or if the external file is in a standard format that can be easily processed. |
| EMI | Okay for local analysis but additional wrappers may be required to allow generic analysis software to access the data. Poor for non-local case as the format will have to be interpreted and software must be written. |
| NVT | Okay (probably better than plain text). |
| Ontology | Good, less chance of misinterpretation than NVT. |
| External file | Not good unless file is immediately readable. |
| EMI | Good because writer can be expressive. |
| NVT | Easy to populate but hard to enforce consistency. |
| Ontology | Easy as long as ontology is in place and easily accessible. |
| External file | Easy to populate but hard to enforce consistency. |
| EMI | Easy. |
The relationship between the development of systems to support a data standard and the different modelling structures that could be used: NVT (Name-value-type), ontologies, external files and extend model inheritance (EMI).
| NVT | Near zero cost. |
| Ontology | Expensive (hard to develop ontology). |
| External file | No cost. |
| EMI | Fairly high cost because additional modelling in advance and the developer must understand the core model and how it can be extended. |
| NVT | Fairly easy as the code need not reflect the attributes, but difficult to ensure consistency as there is no explicit prompting from a controlled vocabulary. |
| Ontology | Some additional costs (importing ontology or calling an ontology service) |
| External file | Very easy (just upload the file). |
| EMI | Changes required to the interface to reflect the extensions unless the interface is created automatically from the model. |
| NVT | Few additional costs as the interface code need not reflect the attributes. |
| Ontology | Low cost as the queries can be generated from the model and ontology. |
| External file | The default is no functionality over the file otherwise extra coding is required which may be relatively costly. |
| EMI | Additional costs as interface code must be written to cover the extension unless the interface is generated from the model. |
| NVT | Low cost as no changes are required to the schema. |
| Ontology | No changes to database schema but small additional costs because the ontology has to be stored locally or linked externally. |
| External file | No changes required to the schema but small additional cost because more than one storage mechanism must be managed (database and file system). |
| EMI | Changes are required to the schema, which are likely to be expensive. |
| NVT | None; terms should be used with caution. NVT cannot restrict the cardinality or possible values. |
| Ontology | Good because a domain value can be enforced. |
| External file | No constraint or value checking. |
| EMI | Some quality enforcement because there will be guidelines as to the types of extensions allowed to a model and the model will enforce constraints on the value stored. |
Figure 2The shared components in different types of functional genomics experiments. The immunohistochemistry images were obtained from .
Importance of tasks for annotation about an experimental hypothesis (*hypothesis unlikely to be analysed).
| Search | Share | Read | Repeat | Comp man | Comp auto | Query | Analyse | Browse | Populate | |
| Importance | High | High | High | Med | High | Med | High | Low* | High | High |
Importance of tasks for descriptions of biological material (*analysis unlikely to be over biological samples).
| Search | Share | Read | Repeat | Comp man | Comp auto | Query | Analyse | Browse | Populate | |
| Importance | High | High | High | Med | Med | High | High | Low* | High | High |
Tasks for experimental protocols.
| Search | Share | Read | Repeat | Comp man | Comp auto | Query | Analyse | Browse | Populate | |
| Importance | Low | Med | High | High | Med | High | Low | Low | Med | High |
Tasks for numerical data.
| Search | Share | Read | Repeat | Comp man | Comp auto | Query | Analyse | Browse | Populate | |
| Importance | Med | High | Low | Low | Low | High | Med | High | Low | High |
Tasks for machine parameters.
| Search | Share | Read | Repeat | Comp man | Comp auto | Query | Analyse | Browse | Populate | |
| Importance | Low | Med | Med | High | Low | High | Low | Low | Low | Med |