| Literature DB >> 22140103 |
Nick Juty1, Nicolas Le Novère, Camille Laibe.
Abstract
The Minimum Information Required in the Annotation of Models Registry (http://www.ebi.ac.uk/miriam) provides unique, perennial and location-independent identifiers for data used in the biomedical domain. At its core is a shared catalogue of data collections, for each of which an individual namespace is created, and extensive metadata recorded. This namespace allows the generation of Uniform Resource Identifiers (URIs) to uniquely identify any record in a collection. Moreover, various services are provided to facilitate the creation and resolution of the identifiers. Since its launch in 2005, the system has evolved in terms of the structure of the identifiers provided, the software infrastructure, the number of data collections recorded, as well as the scope of the Registry itself. We describe here the new parallel identification scheme and the updated supporting software infrastructure. We also introduce the new Identifiers.org service (http://identifiers.org) that is built upon the information stored in the Registry and which provides directly resolvable identifiers, in the form of Uniform Resource Locators (URLs). The flexibility of the identification scheme and resolving system allows its use in many different fields, where unambiguous and perennial identification of data entities are necessary.Entities:
Mesh:
Year: 2011 PMID: 22140103 PMCID: PMC3245029 DOI: 10.1093/nar/gkr1097
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Definitions
| A data collection gathers data of the same type (e.g. DNA, RNA or protein) and stores information regarding the same sets of ‘properties’ (e.g. sequence, references). It should make use of a well-defined internal identifier scheme. For example, the namespace ‘uniprot’ identifies a data collection whose subject is proteins, whose representation is protein sequence-centric, where each entry stores protein domain information and where the identifier scheme can be described using a specific regular expression. Similarly, ‘ec-code’ identifies a data collection that provides access to enzyme records and ‘chebi’ to an ontological representation of chemicals. | |
| A resource is the physical location on the Web where information about a data record can be accessed. A resource provides the instances of all the records belonging to a collection. Since the record identifier is independent of physical location (URL), it may be resolved using any of the resources listed for that data collection. | |
| The namespace is the unique syntactic string which defines a data collection. For example, given the identifier ‘urn:miriam:ec-code:1.1.1.1’, the namespace is defined as ‘ec-code'. This precise lexical string is used in both URN and URL forms of the identifiers. |
Figure 1.Concepts and component information captured in the MIRIAM Registry. The MIRIAM Registry collects information about data collections and resources, allowing them to be referenced using URIs. Red-bounded boxes represent concepts, while green ones depict specific instances. Each collection, which itself can be referenced via a URI, is assigned a namespace. This namespace can be combined with a suitable identifier in order to form a URI identifying the specific data record, independently of any physical locations holding that information. Each of these resolvable physical locations are regarded as an instance of the data record, and can themselves be identified using a URI.
Information
| Identifier | A stable MIRIAM Registry identifier of the data collection. |
| Name | The name usually used to refer to the data collection. |
| Synonym(s) | Alternative name(s) of the data collection. |
| Namespace | The part of the URIs which identifies the data collection. For example ‘ec-code’ for enzymes. |
| Deprecated root URI(s) | MIRIAM URNs or URLs that have become obsolete over time. Deprecated identifiers are stored in the Registry, allowing conversion to current forms. |
| Definition | Short description of the data collection, indicating the focus of its content. |
| Identifier pattern | A regular expression pattern that describes the identifiers used within the data collection. |
| Reference(s) | Link(s) to documentation about the data collection and relevant publication(s). |
| Identifier | Each resource associated with a collection is given a unique identifier in the MIRIAM Registry. |
| Access URL | URL used to retrieve a given data entry, where the token ($id) is replaced with a specified identifier for a record. |
| Website | Root URL of the resource, usually its home page. |
| Description | Brief description about the resource, used to distinguish the current resource from all the others recorded for the same data collection. |
| Institution | The institution responsible for hosting the resource. |
| Health status | Though not a textual field, the resource health status is displayed by the colour-coded text area. |
| Deprecated Physical Location(s) | A list of deprecated resource(s) which are no longer usable to resolve information for this data collection. |
Figure 2.An illustration of the variety of information captured for each data collection in the MIRIAM Registry. Some fields, described in the Information Table 2, are highlighted: (1) Namespace; (2) Identifier pattern, which allows automated checking of identifier validity with respect to the expected expression pattern; and (3) Resource health status, which provides information on resource up- and down-time. A notification of the health status is given through colour coding, while more details are presented on a separate page, via a link on the resource identifier (see inset).
Figure 3.An illustration of the process followed when dereferencing an Identifiers.org URL. The example URL is a location-independent identifier for an ec-code record. When used in a browser, it resolves to an intermediate HTML page that provides a list of possible physical locations where the data record can be retrieved. The default format of this document is HTML, while an RDF/XML version is available via content negotiation or by using the ‘format’ parameter in the URL (see the ‘Identifiers.org Resolving System’ section).