| Literature DB >> 32034905 |
Jerven Bolleman1, Edouard de Castro1, Delphine Baratin1, Sebastien Gehant1, Beatrice A Cuche1, Andrea H Auchincloss1, Elisabeth Coudert1, Chantal Hulo1, Patrick Masson1, Ivo Pedruzzi1, Catherine Rivoire1, Ioannis Xenarios1,2, Nicole Redaschi1, Alan Bridge1.
Abstract
BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation.Entities:
Keywords: SPARQL; function; prediction; protein
Mesh:
Year: 2020 PMID: 32034905 PMCID: PMC7007698 DOI: 10.1093/gigascience/giaa003
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:RDF namespace declarations for prefixes used in other figures.
Figure 2:Part of the HAMAP rule for signature MF_00005 as a SPARQL CONSTRUCT query.
Figure 3:SPARQL CONSTRUCT block of Fig. 2 extended with metadata expressed as RDF reification quads.
Figure 4:Example protein record in an RDF format suitable for HAMAP SPARQL rules.
Simple protein-annotation associations of HAMAP rule MF_00005 for UniProtKB entry B1YJ35
| Protein | Annotation |
|---|---|
| uniprot:B1YJ35 | “GO:0004055” |
| uniprot:B1YJ35 | “GO:0005524” |
| uniprot:B1YJ35 | “GO:0006526” |
| uniprot:B1YJ35 | “GO:0005737” |
| uniprot:B1YJ35 | “ec:6.3.4.5” |
| uniprot:B1YJ35 | “rhea:10932” |
Figure 5:Example protein sequence/signature match in RDF syntax.
Figure 6:Example query for comparison of annotations generated by the different systems, taking into account whether a system inserts the full GO or UniProt keyword hierarchy or only leaf nodes.
Figure 7:(A) Hypothetical triples to describe a sequence entry from RNAcentral.org that is a member of the Rfam RNA family RF00003 (U1 spliceosomal RNA family). (B) Hypothetical rule associating RF00003 to the GO term GO:0005685 (definition: “A ribonucleoprotein complex that contains small nuclear RNA U1”).