Literature DB >> 25294922

Log-odds sequence logos.

Yi-Kuo Yu1, John A Capra2, Aleksandar Stojmirović1, David Landsman1, Stephen F Altschul1.   

Abstract

MOTIVATION: DNA and protein patterns are usefully represented by sequence logos. However, the methods for logo generation in common use lack a proper statistical basis, and are non-optimal for recognizing functionally relevant alignment columns.
RESULTS: We redefine the information at a logo position as a per-observation multiple alignment log-odds score. Such scores are positive or negative, depending on whether a column's observations are better explained as arising from relatedness or chance. Within this framework, we propose distinct normalized maximum likelihood and Bayesian measures of column information. We illustrate these measures on High Mobility Group B (HMGB) box proteins and a dataset of enzyme alignments. Particularly in the context of protein alignments, our measures improve the discrimination of biologically relevant positions.
AVAILABILITY AND IMPLEMENTATION: Our new measures are implemented in an open-source Web-based logo generation program, which is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/logoddslogo/index.html. A stand-alone version of the program is also available from this site. CONTACT: altschul@ncbi.nlm.nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

Mesh:

Year:  2014        PMID: 25294922      PMCID: PMC4318935          DOI: 10.1093/bioinformatics/btu634

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  36 in total

Review 1.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.

Authors:  A A Schäffer; L Aravind; T L Madden; S Shavirin; J L Spouge; Y I Wolf; E V Koonin; S F Altschul
Journal:  Nucleic Acids Res       Date:  2001-07-15       Impact factor: 16.971

2.  An invariant form for the prior probability in estimation problems.

Authors:  H JEFFREYS
Journal:  Proc R Soc Lond A Math Phys Sci       Date:  1946

3.  pLogo: a probabilistic approach to visualizing sequence motifs.

Authors:  Joseph P O'Shea; Michael F Chou; Saad A Quader; James K Ryan; George M Church; Daniel Schwartz
Journal:  Nat Methods       Date:  2013-10-06       Impact factor: 28.547

4.  Improved visualization of protein consensus sequences by iceLogo.

Authors:  Niklaas Colaert; Kenny Helsens; Lennart Martens; Joël Vandekerckhove; Kris Gevaert
Journal:  Nat Methods       Date:  2009-11       Impact factor: 28.547

Review 5.  HMGB proteins: interactions with DNA and chromatin.

Authors:  Michal Stros
Journal:  Biochim Biophys Acta       Date:  2010 Jan-Feb

6.  Dirichlet mixtures, the Dirichlet process, and the structure of protein space.

Authors:  Viet-An Nguyen; Jordan Boyd-Graber; Stephen F Altschul
Journal:  J Comput Biol       Date:  2013-01       Impact factor: 1.479

7.  The S. cerevisiae architectural HMGB protein NHP6A complexed with DNA: DNA and protein conformational changes upon binding.

Authors:  James E Masse; Ben Wong; Yi-Meng Yen; Frédéric H T Allain; Reid C Johnson; Juli Feigon
Journal:  J Mol Biol       Date:  2002-10-18       Impact factor: 5.469

8.  The construction and use of log-odds substitution scores for multiple sequence alignment.

Authors:  Stephen F Altschul; John C Wootton; Elena Zaslavsky; Yi-Kuo Yu
Journal:  PLoS Comput Biol       Date:  2010-07-15       Impact factor: 4.475

9.  CDD: conserved domains and protein three-dimensional structure.

Authors:  Aron Marchler-Bauer; Chanjuan Zheng; Farideh Chitsaz; Myra K Derbyshire; Lewis Y Geer; Renata C Geer; Noreen R Gonzales; Marc Gwadz; David I Hurwitz; Christopher J Lanczycki; Fu Lu; Shennan Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Roxanne A Yamashita; Dachuan Zhang; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

10.  Pseudocounts for transcription factor binding sites.

Authors:  Keishin Nishida; Martin C Frith; Kenta Nakai
Journal:  Nucleic Acids Res       Date:  2008-12-23       Impact factor: 16.971

View more
  5 in total

1.  Evolution of lysine acetylation in the RNA polymerase II C-terminal domain.

Authors:  Corinne N Simonti; Katherine S Pollard; Sebastian Schröder; Daniel He; Benoit G Bruneau; Melanie Ott; John A Capra
Journal:  BMC Evol Biol       Date:  2015-03-10       Impact factor: 3.260

2.  Comparative Epigenomics Reveals that RNA Polymerase II Pausing and Chromatin Domain Organization Control Nematode piRNA Biogenesis.

Authors:  Toni Beltran; Consuelo Barroso; Timothy Y Birkle; Lewis Stevens; Hillel T Schwartz; Paul W Sternberg; Hélène Fradin; Kristin Gunsalus; Fabio Piano; Garima Sharma; Chiara Cerrato; Julie Ahringer; Enrique Martínez-Pérez; Mark Blaxter; Peter Sarkies
Journal:  Dev Cell       Date:  2019-01-31       Impact factor: 12.270

3.  CTCF DNA-binding domain undergoes dynamic and selective protein-protein interactions.

Authors:  Rong Zhou; Kai Tian; Jie Huang; Wenjia Duan; Hongye Fu; Ying Feng; Hui Wang; Yongpeng Jiang; Yuanjun Li; Rui Wang; Jiazhi Hu; Hanhui Ma; Zhi Qi; Xiong Ji
Journal:  iScience       Date:  2022-08-24

4.  ChEC-seq kinetics discriminates transcription factor binding sites by DNA sequence and shape in vivo.

Authors:  Gabriel E Zentner; Sivakanthan Kasinathan; Beibei Xin; Remo Rohs; Steven Henikoff
Journal:  Nat Commun       Date:  2015-10-22       Impact factor: 14.919

5.  In Silico Study of Superoxide Dismutase Gene Family in Potato and Effects of Elevated Temperature and Salicylic Acid on Gene Expression.

Authors:  Jelena Rudić; Milan B Dragićević; Ivana Momčilović; Ana D Simonović; Danijel Pantelić
Journal:  Antioxidants (Basel)       Date:  2022-02-28
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.