| Literature DB >> 29186322 |
Jasper J Koehorst1, Jesse C J van Dam1, Edoardo Saccenti1, Vitor A P Martins Dos Santos1,2, Maria Suarez-Diez1, Peter J Schaap1.
Abstract
Summary: To unlock the full potential of genome data and to enhance data interoperability and reusability of genome annotations we have developed SAPP, a Semantic Annotation Platform with Provenance. SAPP is designed as an infrastructure supporting FAIR de novo computational genomics but can also be used to process and analyze existing genome annotations. SAPP automatically predicts, tracks and stores structural and functional annotations and associated dataset- and element-wise provenance in a Linked Data format, thereby enabling information mining and retrieval with Semantic Web technologies. This greatly reduces the administrative burden of handling multiple analysis tools and versions thereof and facilitates multi-level large scale comparative analysis. Availability and implementation: SAPP is written in JAVA and freely available at https://gitlab.com/sapp and runs on Unix-like operating systems. The documentation, examples and a tutorial are available at https://sapp.gitlab.io. Contact: jasperkoehorst@gmail.com or peter.schaap@wur.nl.Entities:
Mesh:
Year: 2018 PMID: 29186322 PMCID: PMC5905645 DOI: 10.1093/bioinformatics/btx767
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1(A) The conversion module imports genome sequences in common formats. Annotation modules perform common tasks such as gene, tRNA, protein and protein domain annotation. Results are stored as Linked Data and consistency is ensured by the GBOL stack. (B) SPARQL query to retrieve the E-value score of the instances of the protein domain PF00465 across multiple bacterial genomes. (C) Distribution of E-values for protein domain PF00465 across multiple bacterial genomes: note the multimodality of the distribution. (D) Principal component analysis of functional similarities of 100 bacterial genomes from the Streptococcus (blue) and the Staphylococcus (orange) genera. PC1 and PC2 account for 51.4 and 10.1% of the variance in the dataset respectively