Literature DB >> 2330366

Construction of validated, non-redundant composite protein sequence databases.

A J Bleasby1, J C Wootton.   

Abstract

A strategy has been developed for the construction of a validated, comprehensive composite protein sequence database. Entries are amalgamated from primary source data bases by a largely automated set of processes in which redundant and trivially different entries are eliminated. A modular approach has been adopted to allow scientific judgement to be used at each stage of database processing and amalgamation. Source databases are assigned a priority depending on the quality of sequence validation and commenting. Rejection of entries from the lower priority database, in each pairwise comparison of databases, is carried out according to optionally defined redundancy criteria based on sequence segment mismatches. Efficient algorithms for this methodology are embodied in the COMPO software system. COMPO has been applied for over 2 years in construction and regular updating of the OWL composite protein sequence database from the source databases NBRF-PIR, SWISS-PROT, a GenBank translation retrieved from the feature tables, NBRF-NEW, NEWAT86, PSD-KYOTO and the sequences contained in the Brookhaven protein structure databank. OWL is part of the ISIS integrated data resource of protein sequence and structure [Akrigg et al. (1988) Nature, 335, 745-746]. The modular nature of the integration process greatly facilitates the frequent updating of OWL following releases of the source databases. The extent of redundancy in these sources is revealed by the comparison process. The advantages of a robust composite database for sequence similarity searching and information retrieval are discussed.

Mesh:

Substances:

Year:  1990        PMID: 2330366     DOI: 10.1093/protein/3.3.153

Source DB:  PubMed          Journal:  Protein Eng        ISSN: 0269-2139


  36 in total

1.  The search for a new model structure of beta-factor XIIa.

Authors:  E S Henriques; W B Floriano; N Reuter; A Melo; D Brown; J A Gomes; B Maigret; M A Nascimento; M J Ramos
Journal:  J Comput Aided Mol Des       Date:  2001-04       Impact factor: 3.686

2.  Mutagenesis of Glu403 to Cys in rabbit neutral endopeptidase-24.11 (neprilysin) creates a disulphide-linked homodimer: analogy with endothelin-converting enzyme.

Authors:  M V Hoang; C E Sansom; A J Turner
Journal:  Biochem J       Date:  1997-11-01       Impact factor: 3.857

3.  Structure and sequence relationships in the lipocalins and related proteins.

Authors:  D R Flower; A C North; T K Attwood
Journal:  Protein Sci       Date:  1993-05       Impact factor: 6.725

4.  A structural census of the current population of protein sequences.

Authors:  M Gerstein; M Levitt
Journal:  Proc Natl Acad Sci U S A       Date:  1997-10-28       Impact factor: 11.205

5.  Relationships between bacterial drug resistance pumps and other transport proteins.

Authors:  J H Parish; J Bentley
Journal:  J Mol Evol       Date:  1996-02       Impact factor: 2.395

6.  Assessing the impact of secondary structure and solvent accessibility on protein evolution.

Authors:  N Goldman; J L Thorne; D T Jones
Journal:  Genetics       Date:  1998-05       Impact factor: 4.562

7.  MultiCoil: a program for predicting two- and three-stranded coiled coils.

Authors:  E Wolf; P S Kim; B Berger
Journal:  Protein Sci       Date:  1997-06       Impact factor: 6.725

8.  Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores.

Authors:  R Mott
Journal:  Bull Math Biol       Date:  1992-01       Impact factor: 1.758

9.  A three-dimensional model of the Photosystem II reaction centre of Pisum sativum.

Authors:  S V Ruffle; D Donnelly; T L Blundell; J H Nugent
Journal:  Photosynth Res       Date:  1992-11       Impact factor: 3.573

10.  Humanization of murine monoclonal antibodies through variable domain resurfacing.

Authors:  M A Roguska; J T Pedersen; C A Keddy; A H Henry; S J Searle; J M Lambert; V S Goldmacher; W A Blättler; A R Rees; B C Guild
Journal:  Proc Natl Acad Sci U S A       Date:  1994-02-01       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.