Vinod Kumar Singh1, Rohan Misra1, Steven C Almo2, Ulrich G Steidl3, Hannes E Bülow1,4, Deyou Zheng1,2,3. 1. Department of Genetics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, New York, USA. 2. Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, New York, USA. 3. Department of Cell Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, New York, USA. 4. Department of Neuroscience, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, New York, USA.
Abstract
SUMMARY: The functional sub-string(s) of a biopolymer sequence defines the specificity of its interaction with other biomolecules and is often referred to as motifs. Computational algorithms and software have been broadly developed for finding such motifs in sequences in which the individual elements are single characters, such as those in DNA and protein sequences. However, there are more complex scenarios where the motifs exist in non-single-letter contexts, for example, preferred patterns of chemical modifications on proteins, DNAs, RNAs, or polysaccharides. To search for those motifs, we describe a new method that converts the modified sequence elements to representative single-letter codes and then uses a modified Gibbs-sampling algorithm to define the position specific scoring matrix (PSSM) representing the motif(s). As a proof of principle, we describe the implementation and application of an R package for discovering heparan sulfate (HS) motifs in glycan sequences, which are important in regulating protein-protein interactions. This software can be valuable for analyzing high-throughput glycoprotein binding data using microarrays with HS oligosaccharides or other biological polymers. AVAILABILITY AND IMPLEMENTATION: HSMotifDiscover is freely available as an open source R package released under an MIT license at https://github.com/bioinfoDZ/HSMotifDiscover and also available in the form of an app at https://hsmotifdiscover.shinyapps.io/HSMotifDiscover_ShinyApp/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: The functional sub-string(s) of a biopolymer sequence defines the specificity of its interaction with other biomolecules and is often referred to as motifs. Computational algorithms and software have been broadly developed for finding such motifs in sequences in which the individual elements are single characters, such as those in DNA and protein sequences. However, there are more complex scenarios where the motifs exist in non-single-letter contexts, for example, preferred patterns of chemical modifications on proteins, DNAs, RNAs, or polysaccharides. To search for those motifs, we describe a new method that converts the modified sequence elements to representative single-letter codes and then uses a modified Gibbs-sampling algorithm to define the position specific scoring matrix (PSSM) representing the motif(s). As a proof of principle, we describe the implementation and application of an R package for discovering heparan sulfate (HS) motifs in glycan sequences, which are important in regulating protein-protein interactions. This software can be valuable for analyzing high-throughput glycoprotein binding data using microarrays with HS oligosaccharides or other biological polymers. AVAILABILITY AND IMPLEMENTATION: HSMotifDiscover is freely available as an open source R package released under an MIT license at https://github.com/bioinfoDZ/HSMotifDiscover and also available in the form of an app at https://hsmotifdiscover.shinyapps.io/HSMotifDiscover_ShinyApp/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Sailaja Arungundram; Kanar Al-Mafraji; Jinkeng Asong; Franklin E Leach; I Jonathan Amster; Andre Venot; Jeremy E Turnbull; Geert-Jan Boons Journal: J Am Chem Soc Date: 2009-12-02 Impact factor: 15.419
Authors: Martin Tompa; Nan Li; Timothy L Bailey; George M Church; Bart De Moor; Eleazar Eskin; Alexander V Favorov; Martin C Frith; Yutao Fu; W James Kent; Vsevolod J Makeev; Andrei A Mironov; William Stafford Noble; Giulio Pavesi; Graziano Pesole; Mireille Régnier; Nicolas Simonis; Saurabh Sinha; Gert Thijs; Jacques van Helden; Mathias Vandenbogaert; Zhiping Weng; Christopher Workman; Chun Ye; Zhou Zhu Journal: Nat Biotechnol Date: 2005-01 Impact factor: 54.908