| Literature DB >> 24267948 |
Paolo Ciccarese1, Stian Soiland-Reyes, Khalid Belhajjame, Alasdair Jg Gray, Carole Goble, Tim Clark.
Abstract
BACKGROUND: Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as Dublin Core Terms (DC Terms) and the W3C Provenance Ontology (PROV-O) are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. In particular, to track authoring and versioning information of web resources, PROV-O provides a basic methodology but not any specific classes and properties for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator.Entities:
Year: 2013 PMID: 24267948 PMCID: PMC4177195 DOI: 10.1186/2041-1480-4-37
Source DB: PubMed Journal: J Biomed Semantics
Figure 1Depiction of the AlzSWAN knowledge creation and publishing process.
PAV authoring properties
| pav:authoredBy | Indicates an agent that originated or gave existence to the work that is expressed by the digital resource. The author of the content of a resource may be different from the creator of that resource representation ( |
| The author is usually not a software agent (which would be indicated with | |
| pav:authoredOn | Indicates the date this resource was authored by the agents given by |
| This property is normally used in a functional way, indicating the last time of authoring, although PAV does not formally restrict this. | |
| pav:curatedBy | Specifies an agent specialist responsible for shaping the expression in an appropriate format. Often the primary agent responsible for ensuring the quality of the representation. The curator may be different from the author ( |
| pav:curatedOn | Specifies the date this resource was curated. |
| This property is normally used in a functional way, indicating the last curation date, although PAV does not formally restrict this. | |
| pav:contributedBy | Specifies an agent that provided any sort of help in conceiving the work that is expressed by the digital artifact. |
| Contributions can take many forms, of which PAV define the subproperties | |
| Note that | |
| pav:contributedOn | Indicates the date this resource was contributed on. |
Figure 2A claim published in the AlzSWAN knowledge base, authored by the human agent ‘Golde T.’ and curated on the indicated date by the human agent ‘Wong G’. The artifact has been created by a human agent ‘Wu E.’ with the AlzSWAN Workbench and published by the ‘AlzSWAN Team’.
PAV provenance properties
| pav:createdBy | An agent primarily responsible for encoding the digital artifact or resource representation. This creation is distinct from forming the content, which is indicated with |
| For instance, the author wrote’ this species has bigger wings than normal’ in his log book. The curator, going through the log book and identifying important knowledge, formalizes this as ‘locus perculus has wingspan > 0.5 m’. The artifact creator enters this knowledge as a digital resource in the knowledge system, thus creating the digital artifact (say as JSON, RDF, XML or HTML). | |
| A different example is a news article. | |
| The software tool used by the creator to make the digital resource (say Protege, Wordpress or OpenOffice) can be indicated with | |
| pav:createdOn | The date of creation of the digital artifact or resource representation. The agents responsible can be indicated with |
| This property is normally used in a functional way, indicating the time of creation, although PAV does not formally restrict this. | |
| pav:createdWith | The software/tool used by the creator ( |
| pav:createdAt | The geo-location of the agents when creating the resource ( |
| pav:retrievedFrom | The URI where a resource has been retrieved from. Retrieval indicates that this resource has the same representation as the original resource. If the resource has been somewhat transformed, |
| pav:retrievedBy | An entity responsible for retrieving the data from an external source. The retrieving agent is usually a software entity, which has done the retrieval from the original source without performing any transcription. |
| Retrieval indicates that this resource has the same representation as the original resource. If the resource has been somewhat transformed, use | |
| pav:retrievedOn | The date the source for this resource was retrieved. This property is normally used in a functional way, although PAV does not formally restrict this. |
| pav:importedFrom | The original source of imported information. Import means that the content has been preserved, but transcribed somehow, for instance to fit a different representation model by converting formats. The imported resource does not have to be complete but should be consistent with the knowledge conveyed by the original resource. |
| pav:importedBy | An agent responsible for importing data from a source given by |
| pav:importedOn | The date the resource was imported from a source given by |
| This property is normally used in a functional way, although PAV does not formally restrict this. If the resource is later reimported, this should instead be indicated with | |
| pav:lastRefreshedOn | The date of the last import of the resource. This property is used if this version has been updated due to a re-import, rather than the import creating new resources related using |
| pav:providedBy | The |
| The provider might not coincide with the | |
| pav:sourceAccessedAt | A source which was accessed or consulted (but not retrieved, imported or derived from). For instance, a curator ( |
| Another example: I can access the page for tomorrow weather in Boston ( | |
| pav:sourceAccessedBy | The agent who accessed the source given by |
| pav:sourceAccessedOn | The date when the original source given by |
| For instance, if the source accessed described the weather forecast for the next day, the time of source access can be crucial information. | |
| This property is normally used in a functional way, although PAV does not formally restrict this. If the source is subsequently checked again (say to verify validity), this should be indicated with | |
| pav:sourceLastAccessedOn | The date when the original source given by |
| This property can be useful together with |
Figure 3Example of import from the EntrezGene database expressed using Turtle notation [19]. The two namespaces lses (life science entities [http://purl.org/swan/1.2/lses/] and agents [http://purl.org/swan/1.2/agents/] are part of the SWAN suite of ontologies.
PAV versioning properties
| pav:version | The version identifier of a resource. This is a free text string, typical values are ‘1.5’ or ‘21’. The URI identifying the previous version can be provided using |
| pav:previousVersion | The previous version of a resource in a lineage. For instance a news article updated to correct factual information would point to the previous version of the article with |
| This property is normally used in a functional way, although PAV does not formally restrict this. A version identifier for a resource can be provided using the data property | |
| pav:derivedFrom | Derived from a different resource. Derivation concerns itself with derived knowledge. If this resource has the same content as the other resource, but has simply been transcribed to fit a different model (like XML to RDF or SQL to CSV), use |
| Details about who performed the derivation (e.g. who did the refining or modifications) may be indicated with | |
| pav:lastUpdateOn | The date of the last update of the resource. An update is a change which did not warrant making a new resource related using |
Figure 4Example illustrating versioning with PAV. Hypothesis A1 and Hypothesis A1’ have the same URI, they are representing the same resource at different points in time.
Figure 5Example illustrating how PROV-O can be combined with PAV in order to provide a more detailed provenance record.
Figure 6Example illustrating how Collection Ontology (CO) can be used with PAV in order to encode a collection (set) of people.
Figure 7Relationships between the PAV ontology, the PROV ontology and all the projects listed in this article making use of PAV.
Figure 8Example of annotation representation using Annotation Ontology and PAV.
Figure 9The Open PHACTS VoID editor [37], a web-based wizard for creating a VoID dataset description, here representing a data format conversion by using PAV properties :, :, :. The VoID description itself (the generated RDF) has its own provenance, using pav:createdBy, pav:createdOn.
Figure 10Gene disease nanopublication example, in TriG format, adapted from [40]. The nanopublication is expressed as three named graphs: the Assertion which expresses the claim of this nanopublication (Figure 11), the PublicationInfo, which details the attributions of this assertion (Figure 12), and Provenance, relating this nanopublication to the original data it was derived from (shown above).
Figure 11Nanopublication assertion, adapted from [40]. Statistical association between gene and disease expressed using Bio2RDF and SIO ontology. [41,42].
Figure 12Nanopublication attribution specifying attribution information of the nanopublication in Figures10and11. PAV is used to distinguish between the author of the nanopublication (the scientists who made the assertion expressed in Figure 10), and the creator of its digital representation, who in this case expressed the assertion as an RDF graph. The nanopublication is given a pav:version, also identified using dcterms:hasVersion. (The original RDF uses the PAV 1.2 term “versionNumber” which was renamed to “version” in PAV 2.0.).
Mapping from PAV to PROV-O
| prov:wasAttributedTo | pav:createdBy | The creator agent participated in some activity that generated the entity. |
| pav:createdWith | The software agent participated in some activity that generated the entity. | |
| pav:contributedBy | The contributor participated in some activity that generated the entity. | |
| pav:authoredBy | The author participated in some activity that generated the entity. | |
| pav:curatedBy | The curator participated in some activity that generated the entity. | |
| pav:importedBy | The agent (usually software in this case) participated in some import activity, which generated the entity. | |
| pav:retrievedBy | The agent (usually software in this case) participated in some retrieval activity, which generated the entity. | |
| prov:wasDerivedFrom & prov:alternateOf | pav:importedFrom | Import is a transformation of an entity into another. As the resulting entity is presenting aspects of the same thing, it is also an |
| pav:retrievedFrom | Retrieval is construction of an entity into another. As the resulting entity is essentially (bytewise) the same, i.e. | |
| prov:wasDerivedFrom | pav:derivedFrom | Derivation is |
| prov:wasRevisionOf | pav:previousVersion | The new version is a |
| prov:wasInfluencedBy | pav:sourceAccessedAt | The source Entity has an |
Figure 13Inferences from PAV authorship to existential PROV-O activities.pav:authoredBy is subproperty of prov:wasAttributedBy, which, according to PROV constraint attribution-inference imply that there existed some _:activity that generated the resource and which :paolo was associated with.
Figure 14Visualization of ChemSpider VoID provenance. Both subsets are attributed to chemspider.com (pav:retrievedBy), and derived from gz/zip files (pav:retrievedFrom). The VoID file itself is attributed to “me” (pav:createdBy) and derived from void.rdf (pav:derivedFrom). Note that the labels above are generated by ProvToolbox based on the URIs – the n-prefix indicates a prov:Entity. Figure converted from SVG diagram which was produced by Taverna workflow [46].
SKOS mappings of applicable PAV terms to Dublin core terms
| Broad match due to the common usage of | |
| Close match due its the common usage to mean someone who added to the Work of the resource (usually not just the digital representation), but not | |
| A PAV creator is a particular kind of DC Terms creator, which made the digital representation of the resource. | |
| Imported is a specialization of being derived from the related resource in whole. | |
| The resulting resource is | |
| The agent importing is essentially a specialized creator of the new resource, hence has close match | |
| A related resource from which the described resource is derived, but | |
Figure 15Example of BIBFRAME representation of a book as a creative work () and its paperback instance (:) which have features such as dimensions and pages. Note how this work contains parts (tales) that themselves are works, each having individual bc:creators, and how sample:16300892 (the bibliographical record, not the work) is bf:derivedFrom another bibliographical record. Adapted from RDF/XML example at [60].
Figure 16Example of tagging a Facebook post with DBPedia terms using the Common Tag vocabulary [41]. The provenance of the graph that contains the ctag statement expresses a chain of PRV data creation and access activities [61]. In TriG format, abbreviated from figure in [65].
Figure 17SPARQL query over PRV provenance to find data creation, retrieval and access of a Facebook tag. The query finds the activity the tag was prv:createdBy, which was prv:performedBy the agent and prv:usedData that were prv:retrievedBy another activity, which prv:accessedResource the given REST API, prv:performedAt the given time.
Figure 18SPARQL query over PAV provenance to find data creation, retrieval and access of a Facebook tag (equivalent to Figure16). The tag was pav:importedFrom the given REST API, by the importing agent at the given import time.
Figure 19Example of SPARQL CONSTRUCT generating PROV-O activities from PAV imports.