Literature DB >> 7952899

Some useful statistical properties of position-weight matrices.

J M Claverie1.   

Abstract

Position-weight matrices (or profiles) are simple mathematical objects traditionally used to capture the information about local sequence patterns (or motifs) characteristic of a given structure or function. Although weight matrices can lead to fast database scanning algorithms their usage has been limited, due to the lack of a reliable method to assess the statistical significance of the matching scores. In this article I first review 3 different computation scheme for designing weight matrices from a block-alignment of any (small or large) number of sequences. I then show that, for patterns spanning 10 positions or more, the best scores expected from matching random sequences are distributed according to the extreme value (Gumbel) distribution. The threshold of statistical significance assessed from this distribution perfectly delineate the range of scores characterizing "true positive" sequences (biological significant matches). This result allows weight matrices to be used to scan an entire protein database for patterns in a highly sensitive way. MODEST (MOtif DEsign and Search Tools), a suite of programs in Unix/C, implements these statistical improvements and is available upon E-mail request (jmc@ncbi.nlm.nih.gov).

Mesh:

Substances:

Year:  1994        PMID: 7952899     DOI: 10.1016/0097-8485(94)85024-0

Source DB:  PubMed          Journal:  Comput Chem        ISSN: 0097-8485


  8 in total

1.  Two domains of superfamily I helicases may exist as separate proteins.

Authors:  E V Koonin; K E Rudd
Journal:  Protein Sci       Date:  1996-01       Impact factor: 6.725

2.  Discovery of protein phosphorylation motifs through exploratory data analysis.

Authors:  Yi-Cheng Chen; Kripamoy Aguan; Chu-Wen Yang; Yao-Tsung Wang; Nikhil R Pal; I-Fang Chung
Journal:  PLoS One       Date:  2011-05-25       Impact factor: 3.240

3.  Statistical significance of cis-regulatory modules.

Authors:  Dustin E Schones; Andrew D Smith; Michael Q Zhang
Journal:  BMC Bioinformatics       Date:  2007-01-22       Impact factor: 3.169

4.  Large-scale discovery of promoter motifs in Drosophila melanogaster.

Authors:  Thomas A Down; Casey M Bergman; Jing Su; Tim J P Hubbard
Journal:  PLoS Comput Biol       Date:  2006-12-05       Impact factor: 4.475

5.  A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery.

Authors:  Lei Xie; Li Xie; Philip E Bourne
Journal:  Bioinformatics       Date:  2009-06-15       Impact factor: 6.937

6.  Combining position weight matrices and document-term matrix for efficient extraction of associations of methylated genes and diseases from free text.

Authors:  Arwa Bin Raies; Hicham Mansour; Roberto Incitti; Vladimir B Bajic
Journal:  PLoS One       Date:  2013-10-16       Impact factor: 3.240

Review 7.  Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction.

Authors:  Xuhua Xia
Journal:  Scientifica (Cairo)       Date:  2012-10-23

8.  BALCONY: an R package for MSA and functional compartments of protein variability analysis.

Authors:  Alicja Płuciennik; Michał Stolarczyk; Maria Bzówka; Agata Raczyńska; Tomasz Magdziarz; Artur Góra
Journal:  BMC Bioinformatics       Date:  2018-08-14       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.