Literature DB >> 10421524

Minimal-risk scoring matrices for sequence analysis.

T D Wu1, C G Nevill-Manning, D L Brutlag.   

Abstract

We introduce a minimal-risk method for estimating the frequencies of amino acids at conserved positions in a protein family. Our method, called minimal-risk estimation, finds the optimal weighting between a set of observed amino acid counts and a set of pseudofrequencies, which represent prior information about the frequencies. We compute the optimal weighting by minimizing the expected distance between the estimated frequencies and the true population frequencies, measured by either a squared-error or a relative-entropy metric. Our method accounts for the source of the pseudofrequencies, which arise either from the background distribution of amino acids or from applying a substitution matrix to the observed data. Our frequency estimates therefore depend on the size and composition of the observed data as well as the source of the pseudofrequencies. We convert our frequency estimates into minimal-risk scoring matrices for sequence analysis. A large-scale cross-validation study, involving 48 variants of seven methods, shows that the best performing method is minimal-risk estimation using the squared-error metric. Our method is implemented in the package EMATRIX, which is available on the Internet at http://motif.stanford.edu/ematrix.

Mesh:

Substances:

Year:  1999        PMID: 10421524     DOI: 10.1089/cmb.1999.6.219

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  4 in total

1.  Identification and characterization of a pSLA2 plasmid locus required for linear DNA replication and circular plasmid stable inheritance in Streptomyces lividans.

Authors:  Zhongjun Qin; Meijuan Shen; Stanley N Cohen
Journal:  J Bacteriol       Date:  2003-11       Impact factor: 3.490

2.  Fast index based algorithms and software for matching position specific scoring matrices.

Authors:  Michael Beckstette; Robert Homann; Robert Giegerich; Stefan Kurtz
Journal:  BMC Bioinformatics       Date:  2006-08-24       Impact factor: 3.169

3.  Dynamic use of multiple parameter sets in sequence alignment.

Authors:  Xiaoqiu Huang; Douglas L Brutlag
Journal:  Nucleic Acids Res       Date:  2006-12-19       Impact factor: 16.971

4.  eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity.

Authors:  Qiaojuan Jane Su; Lin Lu; Serge Saxonov; Douglas L Brutlag
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.