Literature DB >> 20961957

TOPSAN: a dynamic web database for structural genomics.

Kyle Ellrott¹, Christian M Zmasek, Dana Weekes, S Sri Krishna, Constantina Bakolitsa, Adam Godzik, John Wooley.

Abstract

The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.

Entities: Species

Mesh：

Substances：
Proteins

Year: 2010 PMID： 20961957 PMCID： PMC3013775 DOI： 10.1093/nar/gkq902

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Over the past decade, structural genomics (SG) efforts in the USA alone have determined the structures of more than 3000 previously uncharacterized proteins at a sustained rate of over 500 novel structure depositions per year to the Protein Databank (PDB) (1). Through the discovery of numerous new folds and an even greater number of variants of known folds (2), SG structures provide key input for innovative research into protein evolution and function. One of the main challenges presented by such high-throughput research involves the timely annotation and integration of the resulting data to provide direct input into ongoing research within the greater biological community. Traditional mechanisms for publication are simply too slow to keep pace with the speed of structure determination. Thus, currently over 90% of SG deposited structures are not yet described in literature. The rate and volume of protein structures being produced requires novel mechanisms to ensure that the knowledge gained by these structures is disseminated in a timely manner. Several new protein structure annotation platforms, using wiki-based methods, have been described (3–5). However, their content is largely static and derived from peer-reviewed publications, aspects that do not easily lend themselves to exploring new knowledge about structures. We developed The Open Protein Structure Annotation Network (TOPSAN) to serve both as an annotation and a communication platform with the goal of facilitating and accelerating research relevant to SG structures. TOPSAN integrates a wide range of information about SG proteins, from different high-throughput experiments to literature, evolutionary analysis and even functional predictions. Through the implementation of a semantic web layer in the current version, TOPSAN enables database-like searches through its entire content and thus promotes further integration between its content and mainstream biology.

THE DATABASE

Content and interface

TOPSAN currently contains annotations for over 7250 structures from SG efforts from around the world. Prominent among these are several hundred structures that represent the first experimentally characterized members of their respective families, as well as many proteins for which there is extensive interest within the research community. Annotations and collaborations developed via TOPSAN have led to peer-reviewed publications for several dozen proteins, with many more currently in different stages of development. An overview of the database interface is given in Figure 1.

Figure 1.

Screenshot of a TOPSAN entry [PDB id: 3kk7 (9)]. Automatically generated data (data from external sources) are combined with human input (WYSIWIG editor, tagging, discussion page) and analyses (tools). Authorship and version tracking enable accountability and quality control. Semantic web technologies enable easy import and export of this data. Details of the data specifications can be found on the website.

Implementation

Implementation of the TOPSAN platform has been described in detail elsewhere (6). In brief, TOPSAN was developed using MindTouch, an enterprise open source collaboration and integration platform, which provides tools and scripting capabilities that were used to develop a dynamic website. At the backend, data are collected from a variety of different sources, using multiple tools that have been integrated into the MindTouch platform. An application called TopsanApp is used to retrieve protein information from external resources and to create and store pages for specified proteins on the platform via an API. Data collected by TopsanApp are stored in a local MySQL database (termed topsanDB) that is used to generate individual protein pages with built-in functions for easy access and manipulation of information. TOPSAN additionally utilizes a semantic web-based data import system that enables rapid integration of new data. Semantic web is an architectural layer built on top of existing web pages that consists of hidden embedded tags that employ standardized ontology, allowing searches normally associated with structured databases to be carried out across unstructured data collections (7,8). For our purposes, the semantic web provides a unified framework for integration of data automatically imported from external sources and human-curated annotations. In this environment, scripting calls made from the Dekiscript environment build requests to access and convert XML formatted data available on the web into a semantic web compatible format. Data from compatible sites that provide all records in a semantic web compatible format can be imported directly with no manual conversion. Thus, semantic web data can be automatically imported to TOPSAN from Pfam, UniProt, KEGG Pathway database and PDB. Other sources of data can be imported with a variety of modular, easily adapted plug-ins. Once imported, the data can be queried and utilized in the same web-based Dekiscript environment that was used for the import. Data export is handled by a variety of modular tools that make data available in formats including standard HTML, stripped-down XML, RDF/XML and RDFa. In addition to individual protein annotations, bulk compilations of the entire site are available as compressed files. TOPSAN also provides embeddable web interfaces to help other websites, such as Pfam, integrate TOPSAN annotations. Annotations can be viewed by anyone, but only registered users are enabled to contribute text. Accountability and ownership of ideas is preserved via time-stamped tracking of contributions.

Conclusions and future perspectives

TOPSAN explores an important nexus between human analysis and computational data mining, neither of which can independently handle the challenges of research in a high-throughput data generation era. Future TOPSAN improvements include developing statistical methods for determining the reliability of information extracted from databases and testing means for improving semantic functionality. Additionally, TOPSAN emphasizes user-friendly access to external resources and databases, which might otherwise be unknown or not easy to access for some users. Javascript-based widgets allow users to view and edit annotations while capturing and storing their input in a format that is compatible with the semantic web.

FUNDING

National Institutes of Health National Institute of General Medical Sciences Protein Structure Initiative (grant No. U54 GM074898). Funding for open access charge: University of California San Diego. Conflict of interest statement. None declared.

9 in total

Review 1. A life science Semantic Web: are we there yet?

Authors: Eric Neumann
Journal: Sci STKE Date: 2005-05-10

2. Growth of novel protein structural data.

Authors: Michael Levitt
Journal: Proc Natl Acad Sci U S A Date: 2007-02-20 Impact factor: 11.205

Review 3. Biological knowledge management: the emerging role of the Semantic Web technologies.

Authors: Erick Antezana; Martin Kuiper; Vladimir Mironov
Journal: Brief Bioinform Date: 2009-05-19 Impact factor: 11.622

4. Update on the protein structure initiative.

Authors: John C Norvell; Jeremy M Berg
Journal: Structure Date: 2007-12 Impact factor: 5.006

5. The Protein Data Bank: a computer-based archival file for macromolecular structures.

Authors: F C Bernstein; T F Koetzle; G J Williams; E F Meyer; M D Brice; J R Rodgers; O Kennard; T Shimanouchi; M Tasumi
Journal: J Mol Biol Date: 1977-05-25 Impact factor: 5.469

6. Structure of a membrane-attack complex/perforin (MACPF) family protein from the human gut symbiont Bacteroides thetaiotaomicron.

Authors: Qingping Xu; Polat Abdubek; Tamara Astakhova; Herbert L Axelrod; Constantina Bakolitsa; Xiaohui Cai; Dennis Carlton; Connie Chen; Hsiu Ju Chiu; Thomas Clayton; Debanu Das; Marc C Deller; Lian Duan; Kyle Ellrott; Carol L Farr; Julie Feuerhelm; Joanna C Grant; Anna Grzechnik; Gye Won Han; Lukasz Jaroszewski; Kevin K Jin; Heath E Klock; Mark W Knuth; Piotr Kozbial; S Sri Krishna; Abhinav Kumar; Winnie W Lam; David Marciano; Mitchell D Miller; Andrew T Morse; Edward Nigoghossian; Amanda Nopakun; Linda Okach; Christina Puckett; Ron Reyes; Henry J Tien; Christine B Trame; Henry van den Bedem; Dana Weekes; Tiffany Wooten; Andrew Yeh; Jiadong Zhou; Keith O Hodgson; John Wooley; Marc André Elsliger; Ashley M Deacon; Adam Godzik; Scott A Lesley; Ian A Wilson
Journal: Acta Crystallogr Sect F Struct Biol Cryst Commun Date: 2010-07-31

7. PDBWiki: added value through community annotation of the Protein Data Bank.

Authors: Henning Stehr; Jose M Duarte; Michael Lappe; Jong Bhak; Dan M Bolser
Journal: Database (Oxford) Date: 2010-07-06 Impact factor: 3.451

8. TOPSAN: a collaborative annotation environment for structural genomics.

Authors: Dana Weekes; S Sri Krishna; Constantina Bakolitsa; Ian A Wilson; Adam Godzik; John Wooley
Journal: BMC Bioinformatics Date: 2010-08-17 Impact factor: 3.169

9. Proteopedia - a scientific 'wiki' bridging the rift between three-dimensional structure and function of biomacromolecules.

Authors: Eran Hodis; Jaime Prilusky; Eric Martz; Israel Silman; John Moult; Joel L Sussman
Journal: Genome Biol Date: 2008-08-03 Impact factor: 13.583

9 in total

1. The Protein Structure Initiative Structural Biology Knowledgebase Technology Portal: a structural biology web resource.

Authors: Lida K Gifford; Lester G Carter; Margaret J Gabanyi; Helen M Berman; Paul D Adams
Journal: J Struct Funct Genomics Date: 2012-04-06

2. Structure- and sequence-based function prediction for non-homologous proteins.

Authors: Lee Sael; Meghana Chitale; Daisuke Kihara
Journal: J Struct Funct Genomics Date: 2012-01-22

Review 3. The human microbiome: our second genome.

Authors: Elizabeth A Grice; Julia A Segre
Journal: Annu Rev Genomics Hum Genet Date: 2012-06-06 Impact factor: 8.929

4. Making your database available through Wikipedia: the pros and cons.

Authors: Robert D Finn; Paul P Gardner; Alex Bateman
Journal: Nucleic Acids Res Date: 2011-12-05 Impact factor: 16.971

5. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

Authors: Huilin Wang; Mingjun Wang; Hao Tan; Yuan Li; Ziding Zhang; Jiangning Song
Journal: PLoS One Date: 2014-08-22 Impact factor: 3.240

6. Known structure, unknown function: An inquiry-based undergraduate biochemistry laboratory course.

Authors: Cynthia Gray; Carol W Price; Christopher T Lee; Alison H Dewald; Matthew A Cline; Charles E McAnany; Linda Columbus; Cameron Mura
Journal: Biochem Mol Biol Educ Date: 2015-07-06 Impact factor: 1.160

7. Antibiotic binding of STY3178, a yfdX protein from Salmonella Typhi.

Authors: Paramita Saha; Camelia Manna; Santasabuj Das; Mahua Ghosh
Journal: Sci Rep Date: 2016-02-19 Impact factor: 4.379

8. Structure-based function prediction of uncharacterized protein using binding sites comparison.

Authors: Janez Konc; Milan Hodošček; Mitja Ogrizek; Joanna Trykowska Konc; Dušanka Janežič
Journal: PLoS Comput Biol Date: 2013-11-14 Impact factor: 4.475

9. DNASU plasmid and PSI:Biology-Materials repositories: resources to accelerate biological research.

Authors: Catherine Y Seiler; Jin G Park; Amit Sharma; Preston Hunter; Padmini Surapaneni; Casey Sedillo; James Field; Rhys Algar; Andrea Price; Jason Steel; Andrea Throop; Michael Fiacco; Joshua LaBaer
Journal: Nucleic Acids Res Date: 2013-11-12 Impact factor: 16.971

9 in total