Literature DB >> 22355226

A protein short motif search tool using amino acid sequence and their secondary structure assignment.

Arun Venkataraman, Teong Han Chew, Zeti Azura Mohamed Hussein, Mohd Shahir Shamsir.   

Abstract

UNLABELLED: We present the development of a web server, a protein short motif search tool that allows users to simultaneously search for a protein sequence motif and its secondary structure assignments. The web server is able to query very short motifs searches against PDB structural data from the RCSB Protein Databank, with the users defining the type of secondary structures of the amino acids in the sequence motif. The output utilises 3D visualisation ability that highlights the position of the motif in the structure and on the corresponding sequence. Researchers can easily observe the locations and conformation of multiple motifs among the results. Protein short motif search also has an application programming interface (API) for interfacing with other bioinformatics tools. AVAILABILITY: The database is available for free at http://birg3.fbb.utm.my/proteinsms.

Keywords:  Protein short motif search; application programming interface (API); protein secondary structure; visualization

Year:  2011        PMID: 22355226      PMCID: PMC3280500          DOI: 10.6026/007/97320630007304

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background:

Motifs are frequently observed in biological sequences, such as transcription factor binding sites in DNA sequences and catalytic sites in protein sequences. The purpose of our tool is to allow users to simultaneously search for a sequence motif and its secondary structure assignments. Because a protein sequence motif is identified on the basis of sequence similarity and without the knowledge of the function that is conferred by the structural conformation represented by its assignment, it is important to determine where the conserved amino acids lie in the three-dimensional (3D) structure and to what extent these conserved amino acids represent known functional regions. There are many sequence motif search engines available online, but they have varied limitations. Most search functions in motif databases are limited to previously identified motifs such as InterPro [1], BLOCKS [2] and PRINTS [3]. The majority of the motif search tools and databases do not have 3D visualisation and present their results as sequences. The position of the motif in the spatial arrangement is either visualised using third party tools, such as Jmol or using a mash-up that combines sequence searching and 3D visualisation, such as ScanProsite for PROSITE [4] and Motif3D for PRINTS [5]. However, ScanProsite only displays a static GIF image of the motif, whereas Motif3D does not have the ability to query the ultrashort linear motifs typically found in SLiM [6] and Mini Motif Miner [7]. Recent developments of 3D motif visualisation tools allow interactive 3D visualisation within the conformational structure; these tool include seeMotif [8], 3MATRIX and 3MOTIF [9], and PDBeMotif [10]. However, PDBeMotif only allow users to add secondary structure patterns and not to specifically assign secondary structures to the amino acids in their motif queries.

Database development:

Data downloaded from Protein Data Bank (PDB) was stored on a single server that serves as the web server and perform the search function. Powered by Ubuntu 9.04 Server Edition, the server runs on top of Lighttpd 1.4.19 with FastCGI in addition to a backup web server of FastCGI-enabled Apache 2.2.4. Development uses Perl (version 5.10) with MySQL for database management, JavaScript, Yahoo User Interface (YUI), AJAX, JSON and Perl DBIx::Class. The website was built modularly, following a Model-View-Controller (MVC) framework as a FastCGI application. The server also acts as a PDB file server catering to Jmol requests and PDB files are streamed to the Jmol applet for each successful query through MySQL table.

Software input:

A user will enter a sequence motif and its corresponding secondary structure for the amino acids into the submission box. The queries will then be searched against PDB structure files, which are continuously updated. There are several variables that can narrow the search possibilities. An example search for the sequence motif “PEEL” that exists in beta sheets requires the user to enter “PEEL” in the sequence query and EEEE (or H for helices) in the secondary structure assignment to search for all occurrences of the sequence and its corresponding secondary assignment. Users can use the wildcard “O” for any type of secondary structure or the wildcard “X” for unassigned secondary structures, which are usually found in undefined regions of the protein.

Software output:

The results page (Figure 1(a)) shows the query on the top half of the page and highlights the number of matches against the PDB structural data from the RCSB. A brief description of the protein is provided and the users can also download all of the matching structures in FASTA format. The results page also shows a sequence alignment of the match and its corresponding secondary structure. We also added visualisation capability using Jmol in which the structure is loaded in a new window, and the position match highlighted in the structure (Figure 1(b)). Users can explore and export the structure using Jmol functionalities. An application programming interface (API) for Protein Short Motif Search was created to allow other developers to parse their data.
Figure 1

(a) Protein SMS Search Results and (b) Jmol Visualisation of the Results.

Caveat:

The search is conducted against PDB files downloaded weekly from the PDB.

Conclusion:

Protein short motif search unique functionality is the ability to search short motifs where the secondary structure of each amino acid in the motif can be specifically assigned. It is aimed to complement other search tools with the API allowing users to automate parsing high throughput data.

Future Development:

We intend to improve by adding functionalities and annotations such as solvent accessibility value cluster, the results according to SCOP or CATH classification and link to other protein database.
  10 in total

1.  Increased coverage of protein families with the blocks database servers.

Authors:  J G Henikoff; E A Greene; S Pietrokovski; S Henikoff
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  PRINTS and its automatic supplement, prePRINTS.

Authors:  T K Attwood; P Bradley; D R Flower; A Gaulton; N Maudling; A L Mitchell; G Moulton; A Nordle; K Paine; P Taylor; A Uddin; C Zygouri
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

3.  Motif3D: Relating protein sequence motifs to 3D structure.

Authors:  Anna Gaulton; Teresa K Attwood
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

4.  3MATRIX and 3MOTIF: a protein structure visualization system for conserved sequence motifs.

Authors:  Steven P Bennett; Lin Lu; Douglas L Brutlag
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

5.  SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent.

Authors:  Norman E Davey; Denis C Shields; Richard J Edwards
Journal:  Nucleic Acids Res       Date:  2006-07-19       Impact factor: 16.971

6.  PDA v.2: improving the exploration and estimation of nucleotide polymorphism in large datasets of heterogeneous DNA.

Authors:  Sònia Casillas; Antonio Barbadilla
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

7.  seeMotif: exploring and visualizing sequence motifs in 3D structures.

Authors:  Darby Tien-Hao Chang; Ting-Ying Chien; Chien-Yu Chen
Journal:  Nucleic Acids Res       Date:  2009-05-28       Impact factor: 16.971

8.  InterPro: the integrative protein signature database.

Authors:  Sarah Hunter; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Alex Bateman; David Binns; Peer Bork; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D Finn; Julian Gough; Daniel Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Aurélie Laugraud; Ivica Letunic; David Lonsdale; Rodrigo Lopez; Martin Madera; John Maslen; Craig McAnulla; Jennifer McDowall; Jaina Mistry; Alex Mitchell; Nicola Mulder; Darren Natale; Christine Orengo; Antony F Quinn; Jeremy D Selengut; Christian J A Sigrist; Manjula Thimma; Paul D Thomas; Franck Valentin; Derek Wilson; Cathy H Wu; Corin Yeats
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

9.  Minimotif miner 2nd release: a database and web system for motif search.

Authors:  Sanguthevar Rajasekaran; Sudha Balla; Patrick Gradie; Michael R Gryk; Krishna Kadaveru; Vamsi Kundeti; Mark W Maciejewski; Tian Mi; Nicholas Rubino; Jay Vyas; Martin R Schiller
Journal:  Nucleic Acids Res       Date:  2008-10-31       Impact factor: 16.971

10.  MSDmotif: exploring protein sites and motifs.

Authors:  Adel Golovin; Kim Henrick
Journal:  BMC Bioinformatics       Date:  2008-07-17       Impact factor: 3.169

  10 in total
  1 in total

1.  Structural and functional annotation of hypothetical proteins of human adenovirus: prioritizing the novel drug targets.

Authors:  Muhammad Naveed; Sana Tehreem; Muhammad Usman; Zoma Chaudhry; Ghulam Abbas
Journal:  BMC Res Notes       Date:  2017-12-06
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.